[dpdk-dev] [EXT] [PATCH v2 1/1] librte_eal: rte_intr_callback_unregister_sync() - wrapper around rte_intr_callback_unregister().

Renata Saiakhova renata.saiakhova at ekinops.com
Mon Nov 30 18:20:48 CET 2020


Hi Harman,

sorry for late reply...

Yes, indeed, this is a race between an application which calls rte_dev_remove() and a kernel event which is sent as a result of unbinding the device from vfio_pci driver.
(dpdk-devbind.py -u 0000:05:00.0)

rte_intr_callback_unregister() may fail and return -EAGAIN, if an interrupt source (kernel) has some active callbacks at the moment. As a result, the callback (req notifier) can never be unregistered,
and vfio_req_intr_handle.fd can never be closed.
The kernel will continuously try to notify the user space using req notifier, but as the device is already removed, in this case it even cannot find a bus for that device, below is the log which illustrates it:
EAL: fail to unregister req notifier handler.
EAL: fail to disable req notifier.
dpdk_disconnect 1545: Device '0000:05:00.0' has been removed and detached
dpdk_disconnect 1557: All devices shared with device '0000:05:00.0' have been detached
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
EAL: Cannot find bus for device (05:00.0)
etc.

This continues eternally, and application stops to work properly.
So, at least the retry logic should be put somewhere to avoid this kind of race. Or bus->hot_unplug_handler(dev) called from pci_vfio_req_handler() should do some work to release the above resources.

 Regarding the tight polling loop in the patch and fixed retry logic to avoid infinite looping, what could be an option? As it continues to loop only in -EAGAIN case, which means kernel event is processed, doesn't it guarantee that it won't last forever?

Kind regards,
Renata

________________________________
From: Harman Kalra <hkalra at marvell.com>
Sent: Wednesday, October 28, 2020 9:36 PM
To: Renata Saiakhova <renata.saiakhova at ekinops.com>
Cc: Anatoly Burakov <anatoly.burakov at intel.com>; Bruce Richardson <bruce.richardson at intel.com>; Ray Kinsella <mdr at ashroe.eu>; Neil Horman <nhorman at tuxdriver.com>; dev at dpdk.org <dev at dpdk.org>
Subject: Re: [EXT] [PATCH v2 1/1] librte_eal: rte_intr_callback_unregister_sync() - wrapper around rte_intr_callback_unregister().

On Mon, Aug 17, 2020 at 04:08:27PM +0200, Renata Saiakhova wrote:
> External Email
>
> ----------------------------------------------------------------------
> Avoid race with unregister interrupt hanlder if interrupt
> source has some active callbacks at the moment, use wrapper
> around rte_intr_callback_unregister() to check for -EAGAIN
> return value and to loop until rte_intr_callback_unregister()
> succeeds.
>

Hi Renata,

   Just trying to understand the scenario, as you mentioned "while
   removing the device by rte_dev_remove()" are you calling
   rte_eal_hotplug_remove or kernel has sent an event to remove the
   device. As far as I know vfio notifier mechanism is used by kernel
   vfio driver to notify user to release the resources and as you are
   observing EAGAIN means same callback is executing.
   Regarding the tight polling loop in the patch, I think its good to
   have a fixed retry logic to avoid any unidentified corner case which
   might lead to infinite looping.

Thanks
Harman


> Signed-off-by: Renata Saiakhova <Renata.Saiakhova at ekinops.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c        |  2 +-
>  lib/librte_eal/freebsd/eal_interrupts.c | 12 ++++++++++++
>  lib/librte_eal/include/rte_interrupts.h | 25 +++++++++++++++++++++++++
>  lib/librte_eal/linux/eal_interrupts.c   | 12 ++++++++++++
>  lib/librte_eal/rte_eal_version.map      |  1 +
>  5 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 07e072e13..a4bfdf553 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -415,7 +415,7 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
>                return -1;
>        }
>
> -     ret = rte_intr_callback_unregister(&dev->vfio_req_intr_handle,
> +     ret = rte_intr_callback_unregister_sync(&dev->vfio_req_intr_handle,
>                                           pci_vfio_req_handler,
>                                           (void *)&dev->device);
>        if (ret < 0) {
> diff --git a/lib/librte_eal/freebsd/eal_interrupts.c b/lib/librte_eal/freebsd/eal_interrupts.c
> index 6d53d33c8..7d99bdaff 100644
> --- a/lib/librte_eal/freebsd/eal_interrupts.c
> +++ b/lib/librte_eal/freebsd/eal_interrupts.c
> @@ -345,6 +345,18 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
>        return ret;
>  }
>
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +             rte_intr_callback_fn cb_fn, void *cb_arg)
> +{
> +     int ret = 0;
> +
> +     while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
> +             rte_pause();
> +
> +     return ret;
> +}
> +
>  int
>  rte_intr_enable(const struct rte_intr_handle *intr_handle)
>  {
> diff --git a/lib/librte_eal/include/rte_interrupts.h b/lib/librte_eal/include/rte_interrupts.h
> index e3b406abc..cc3bf45d8 100644
> --- a/lib/librte_eal/include/rte_interrupts.h
> +++ b/lib/librte_eal/include/rte_interrupts.h
> @@ -94,6 +94,31 @@ rte_intr_callback_unregister_pending(const struct rte_intr_handle *intr_handle,
>                                rte_intr_callback_fn cb_fn, void *cb_arg,
>                                rte_intr_unregister_callback_fn ucb_fn);
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Loop until rte_intr_callback_unregister() succeeds.
> + * After a call to this function,
> + * the callback provided by the specified interrupt handle is unregistered.
> + *
> + * @param intr_handle
> + *  pointer to the interrupt handle.
> + * @param cb
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +__rte_experimental
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +                             rte_intr_callback_fn cb, void *cb_arg);
> +
>  /**
>   * It enables the interrupt for the specified handle.
>   *
> diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
> index 13db5c4e8..c99d5dbd4 100644
> --- a/lib/librte_eal/linux/eal_interrupts.c
> +++ b/lib/librte_eal/linux/eal_interrupts.c
> @@ -662,6 +662,18 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle,
>        return ret;
>  }
>
> +int
> +rte_intr_callback_unregister_sync(const struct rte_intr_handle *intr_handle,
> +                     rte_intr_callback_fn cb_fn, void *cb_arg)
> +{
> +     int ret = 0;
> +
> +     while ((ret = rte_intr_callback_unregister(intr_handle, cb_fn, cb_arg)) == -EAGAIN)
> +             rte_pause();
> +
> +     return ret;
> +}
> +
>  int
>  rte_intr_enable(const struct rte_intr_handle *intr_handle)
>  {
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index bf0c17c23..b1d824f59 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -325,6 +325,7 @@ EXPERIMENTAL {
>        rte_fbarray_find_rev_biggest_free;
>        rte_fbarray_find_rev_biggest_used;
>        rte_intr_callback_unregister_pending;
> +     rte_intr_callback_unregister_sync;
>        rte_realloc_socket;
>
>        # added in 19.08
> --
> 2.17.2
>


More information about the dev mailing list