[dpdk-stable] [PATCH] net/failsafe: fix Rx clean race
gaetan.rivet at 6wind.com
Thu Oct 26 18:20:29 CEST 2017
I think the commit log could be shorter.
Proposing this, feel free to expand it if you prefer.
When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.
When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.
Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.
On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
> In case of plug out, the RMV interrupt callback sets the remove flag of
> the removed sub-device. The next hotplug alarm cycle should read this
> flag and if the data path are clean it should remove the sub-device.
> In case of fail-safe RX burst calling from application, fail-afe tries
> to call to all STARTED sub-device rx_burst functions. The remove flag
> is not checked here and fail-safe may call to the removed sub-device
> rx_burst function.
> The above 2 cases run in different threads and there is a race between
> the removed sub-device RX clean check to the removed sub-device
> rx_burst call makes the sub device RX unclean.
> If the application calls to rx_burst in loop, the probability to get RX
> clean is not enough, especially when there are few sub-devices or if the
> rx_burst function of the removed sub-device takes a lot of time.
> Each time the sub-device data path is unclean, the second oportunity to
> check it again should be only in the hotplug alarm next cycle; the
> default time between cycles is 2 seconds.
> In this loop when fail-safe tries to remove the sub-device, the
> sub-device may appear back and fail-safe cannot plug it in back until
> the removal process is completted. In this time fail-safe may lose the
> primary sub-device services and may hurt application performance.
> This patch adds a remove flag check in safe rx_burst function.
> By this way, at most one more hotplug alarm cycle is necessary
> to get the sub-device clean for actual removal.
> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
> Cc: stable at dpdk.org
> Signed-off-by: Matan Azrad <matan at mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet at 6wind.com>
> drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> index 7311421..70157c8 100644
> --- a/drivers/net/failsafe/failsafe_rxtx.c
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -43,7 +43,8 @@
> return (ETH(sdev) == NULL) ||
> (ETH(sdev)->rx_pkt_burst == NULL) ||
> - (sdev->state != DEV_STARTED);
> + (sdev->state != DEV_STARTED) ||
> + (sdev->remove != 0);
> static inline int
More information about the stable