[PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
Slava Ovsiienko
viacheslavo at nvidia.com
Tue Feb 10 18:05:04 CET 2026
Hi,
I'm sorry, I have some concern about the patch.
How it works, as far as I understand:
- DPDK simulates interrupts in user mode with epoll_wait()
- mlx5 PMD emits the async counter query command to the NIC periodically
- there might be multiple async query commands in the flight
- kernel drivers handles the async query completion interrupts, pushes the token to the internal completion queue and unblocks associated fd
- epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
- mlx5 PMD reads the completion token from the kernel queue with devx_get_async_cmd_comp()
The concern scenario, let's assume:
- we have 2 async query commands in the flight
- the first async query completes, fd is unblocked, PMD is inviked, the completion is read by PMD and is being handled
- the second async query completes, fd gets unblocked, the second token is written to the queue
- the PMD completes the handling of the first completion and reads the queue again (with devx_get_async_cmd_comp() call in the loop)
- it reads the second token successfully and handles
- then, on the third call, devx_get_async_cmd_comp() returns EAGAIN, it means queue is empty
- DPDK calls epoll_wait() again and sees unblocked fd
- it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue is empty (handled in previous interrupt handling)
- with the patch we wrongly remove the handler
In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU from the epoll_wait()_return also for
RTE_INTR_HANDLE_EXT and RTE_INTR_HANDLE_DEV_EVENT interrupt types.
With best regards,
Slava
> -----Original Message-----
> From: Kevin Traynor <ktraynor at redhat.com>
> Sent: Tuesday, February 10, 2026 5:06 PM
> To: Stephen Hemminger <stephen at networkplumber.org>
> Cc: dev at dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas at monjalon.net>; david.marchand at redhat.com; Dariusz Sosnowski
> <dsosnowski at nvidia.com>; Slava Ovsiienko <viacheslavo at nvidia.com>;
> stable at dpdk.org
> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
>
> On 07/02/2026 06:09, Stephen Hemminger wrote:
> > On Fri, 6 Feb 2026 17:20:53 +0000
> > Kevin Traynor <ktraynor at redhat.com> wrote:
> >
> >> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
> EPOLLRDHUP
> >> epoll events for the devx interrupt fd.
> >>
> >> This may happen if the interrupt fd is deleted, if the device is
> >> unbound from mlx5_core kernel driver or if the device is removed by
> >> the mlx5 kernel driver as part of LAG setup.
> >>
> >> When that occurs, there is no data to be read and in the devx
> >> interrupt handler an EAGAIN is returned on the first call to
> >> devx_get_async_cmd_comp, but this is not checked.
> >>
> >> As the interrupt is not removed or condition reset, it causes an
> >> interrupt processing busy-loop, which leads to the dpdk-intr thread
> >> going to 100% CPU.
> >>
> >> e.g.
> >> epoll_wait
> >> (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >> = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
> >> (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >> = -1 EAGAIN (Resource temporarily unavailable)
> >>
> >> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
> >> first read. If that happens, unregister the callback to prevent looping.
> >>
> >> Bugzilla ID: 1873
> >> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> >> Cc: stable at dpdk.org
> >>
> >> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
> >
> > AI spotted this, I didn't...
> >
> >
> > Errors:
> >
> > Line 139: Unnecessary semicolon after closing brace
> >
> > c
> >
> > };
> >
> > Should be:
> > c
> >
> > }
> >
> > Lines 142-146: Block comment uses incorrect style Block comments in C
> code should use /* and */ style, not /** which is reserved for documentation
> comments.
> >
> > c
> >
> > /**
> > * no data and EAGAIN indicate there is an error or
> > * disconnect state. Unregister callback to prevent
> > * interrupt busy-looping.
> > */
> >
> > Should be:
> > c
> >
> > /*
> > * no data and EAGAIN indicate there is an error or
> > * disconnect state. Unregister callback to prevent
> > * interrupt busy-looping.
> > */
> >
> > Warnings:
> >
> > Logic clarity: The variable data_read is set to true inside the while loop but
> never checked when data WAS read. Consider if data_read is the clearest way to
> express this condition.
> >
>
> Ack above. Thanks.Will be fixed in v3.
More information about the stable
mailing list