[dpdk-dev] [dpdk-stable] [PATCH v3] net/failsafe: fix calling device during RMV events

Gaëtan Rivet gaetan.rivet at 6wind.com
Fri Jan 19 00:35:34 CET 2018


On Thu, Jan 18, 2018 at 11:22:51PM +0100, Thomas Monjalon wrote:
> 29/11/2017 20:17, Ferruh Yigit:
> > >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> > >>>> This commit prevents control path operations from failing after a sub
> > >>>> device removal.
> > >>>>
> > >>>> Following are the failure steps:
> > >>>> 1. The physical device is removed due to change in one of PF
> > >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> > >>>> Within 2 seconds Interrupt thread initializes the actual device
> > >>>> removal, then every 2 seconds it tries to re-sync (plug in) the
> > >>>> device. The trials fail as long as VF parameter mismatches the PF
> > >>> parameter.
> > >>>> 4. A control thread initiates a control operation on failsafe which
> > >>>> initiates this operation on the device.
> > >>>> 5. A race condition occurs between the control thread and interrupt
> > >>>> thread when accessing the device data structures.
> > >>>>
> > >>>> This commit prevents the race condition in step 5. Before this commit
> > >>>> if a device was removed and then a control thread operation was
> > >>>> initiated on failsafe - in some cases failsafe called the sub device
> > >>>> operation instead of avoiding it. Such cases could lead to operations
> > >>> failures.
> [...]
> > 
> > Reminder of this patch remaining from previous release.
> 
> Gaetan, what is the decision for this possible race condition?

This patchset had several issues that I outlined.

> Can we try to fix it in 18.02?

These patches could go in with a rework. If you feel like it I can
review those fixes in the coming weeks if new versions are submitted.

-- 
Gaëtan Rivet
6WIND


More information about the dev mailing list