[dpdk-dev] [dpdk-stable] [PATCH v3] net/failsafe: fix calling device during RMV events

Thomas Monjalon thomas at monjalon.net
Thu Jan 18 23:22:51 CET 2018


29/11/2017 20:17, Ferruh Yigit:
> >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> >>>> This commit prevents control path operations from failing after a sub
> >>>> device removal.
> >>>>
> >>>> Following are the failure steps:
> >>>> 1. The physical device is removed due to change in one of PF
> >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> >>>> Within 2 seconds Interrupt thread initializes the actual device
> >>>> removal, then every 2 seconds it tries to re-sync (plug in) the
> >>>> device. The trials fail as long as VF parameter mismatches the PF
> >>> parameter.
> >>>> 4. A control thread initiates a control operation on failsafe which
> >>>> initiates this operation on the device.
> >>>> 5. A race condition occurs between the control thread and interrupt
> >>>> thread when accessing the device data structures.
> >>>>
> >>>> This commit prevents the race condition in step 5. Before this commit
> >>>> if a device was removed and then a control thread operation was
> >>>> initiated on failsafe - in some cases failsafe called the sub device
> >>>> operation instead of avoiding it. Such cases could lead to operations
> >>> failures.
[...]
> 
> Reminder of this patch remaining from previous release.

Gaetan, what is the decision for this possible race condition?
Can we try to fix it in 18.02?


More information about the dev mailing list