[dpdk-stable] [PATCH] net/failsafe: fix Rx clean race
Gaëtan Rivet
gaetan.rivet at 6wind.com
Thu Oct 26 18:20:29 CEST 2017
Hello Matan,
I think the commit log could be shorter.
Proposing this, feel free to expand it if you prefer.
---8<---
When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.
When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.
Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.
--->8---
Otherwise,
On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
> In case of plug out, the RMV interrupt callback sets the remove flag of
> the removed sub-device. The next hotplug alarm cycle should read this
> flag and if the data path are clean it should remove the sub-device.
>
> In case of fail-safe RX burst calling from application, fail-afe tries
> to call to all STARTED sub-device rx_burst functions. The remove flag
> is not checked here and fail-safe may call to the removed sub-device
> rx_burst function.
>
> The above 2 cases run in different threads and there is a race between
> the removed sub-device RX clean check to the removed sub-device
> rx_burst call makes the sub device RX unclean.
>
> If the application calls to rx_burst in loop, the probability to get RX
> clean is not enough, especially when there are few sub-devices or if the
> rx_burst function of the removed sub-device takes a lot of time.
>
> Each time the sub-device data path is unclean, the second oportunity to
> check it again should be only in the hotplug alarm next cycle; the
> default time between cycles is 2 seconds.
>
> In this loop when fail-safe tries to remove the sub-device, the
> sub-device may appear back and fail-safe cannot plug it in back until
> the removal process is completted. In this time fail-safe may lose the
> primary sub-device services and may hurt application performance.
>
> This patch adds a remove flag check in safe rx_burst function.
> By this way, at most one more hotplug alarm cycle is necessary
> to get the sub-device clean for actual removal.
>
> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
> Cc: stable at dpdk.org
>
> Signed-off-by: Matan Azrad <matan at mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet at 6wind.com>
> ---
> drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> index 7311421..70157c8 100644
> --- a/drivers/net/failsafe/failsafe_rxtx.c
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -43,7 +43,8 @@
> {
> return (ETH(sdev) == NULL) ||
> (ETH(sdev)->rx_pkt_burst == NULL) ||
> - (sdev->state != DEV_STARTED);
> + (sdev->state != DEV_STARTED) ||
> + (sdev->remove != 0);
> }
>
> static inline int
> --
> 1.8.3.1
>
--
Gaëtan Rivet
6WIND
More information about the stable
mailing list