[dpdk-stable] [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race

Ferruh Yigit ferruh.yigit at intel.com
Thu Oct 26 21:10:21 CEST 2017

On 10/26/2017 9:20 AM, Gaëtan Rivet wrote:
> Hello Matan,
> I think the commit log could be shorter.
> Proposing this, feel free to expand it if you prefer.
> ---8<---
> When removing a device, the fail-safe checks that it is not within its
> datapath before cleaning it.
> When checking whether an Rx burst should be performed on a device, the
> remove flag is not checked. Thus the port could still enter its datapath
> and miss a removal round. Furthermore, there is a race between the
> thread removing the device and the polling thread.
> Check the remove flag before entering a sub-device Rx burst when in safe
> mode. This check mitigates the aforementioned race condition.
> --->8---
> Otherwise,
> On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
>> In case of plug out, the RMV interrupt callback sets the remove flag of
>> the removed sub-device. The next hotplug alarm cycle should read this
>> flag and if the data path are clean it should remove the sub-device.
>> In case of fail-safe RX burst calling from application, fail-afe tries
>> to call to all STARTED sub-device rx_burst functions. The remove flag
>> is not checked here and fail-safe may call to the removed sub-device
>> rx_burst function.
>> The above 2 cases run in different threads and there is a race between
>> the removed sub-device RX clean check to the removed sub-device
>> rx_burst call makes the sub device RX unclean.
>> If the application calls to rx_burst in loop, the probability to get RX
>> clean is not enough, especially when there are few sub-devices or if the
>> rx_burst function of the removed sub-device takes a lot of time.
>> Each time the sub-device data path is unclean, the second oportunity to
>> check it again should be only in the hotplug alarm next cycle; the
>> default time between cycles is 2 seconds.
>> In this loop when fail-safe tries to remove the sub-device, the
>> sub-device may appear back and fail-safe cannot plug it in back until
>> the removal process is completted. In this time fail-safe may lose the
>> primary sub-device services and may hurt application performance.
>> This patch adds a remove flag check in safe rx_burst function.
>> By this way, at most one more hotplug alarm cycle is necessary
>> to get the sub-device clean for actual removal.
>> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
>> Cc: stable at dpdk.org
>> Signed-off-by: Matan Azrad <matan at mellanox.com>
> Acked-by: Gaetan Rivet <gaetan.rivet at 6wind.com>

Applied to dpdk-next-net/master, thanks.

(used suggested commit log, thanks.)

More information about the stable mailing list