[dpdk-dev] [PATCH] igb_uio: revert open and release operations

Ferruh Yigit ferruh.yigit at intel.com
Fri Oct 20 03:15:15 CEST 2017


On 10/19/2017 3:43 PM, Patil, Harish wrote:
> -----Original Message-----
> From: Harish Patil <Harish.Patil at cavium.com>
> Date: Tuesday, October 17, 2017 at 9:50 PM
> To: Thomas Monjalon <thomas at monjalon.net>, Ferruh Yigit
> <ferruh.yigit at intel.com>
> Cc: "dev at dpdk.org" <dev at dpdk.org>, Jianfeng Tan <jianfeng.tan at intel.com>,
> Jingjing Wu <jingjing.wu at intel.com>, "Thotton, Shijith"
> <Shijith.Thotton at cavium.com>, Gregory Etelson <gregory at weka.io>, George
> Prekas <george.prekas at epfl.ch>, "stable at dpdk.org" <stable at dpdk.org>
> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>
>>
>> -----Original Message-----
>> From: Thomas Monjalon <thomas at monjalon.net>
>> Date: Tuesday, October 17, 2017 at 1:33 PM
>> To: Ferruh Yigit <ferruh.yigit at intel.com>, Harish Patil
>> <Harish.Patil at cavium.com>
>> Cc: "dev at dpdk.org" <dev at dpdk.org>, Jianfeng Tan <jianfeng.tan at intel.com>,
>> Jingjing Wu <jingjing.wu at intel.com>, "Thotton, Shijith"
>> <Shijith.Thotton at cavium.com>, Gregory Etelson <gregory at weka.io>, George
>> Prekas <george.prekas at epfl.ch>, "stable at dpdk.org" <stable at dpdk.org>
>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>
>>> 17/10/2017 22:14, Ferruh Yigit:
>>>> There were bug reports about terminated application may leave device in
>>>> undesired state:
>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html
>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html
>>>>
>>>> And a proposal to fix:
>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html
>>>>
>>>> Later another proposal triggered the discussion:
>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html
>>>>
>>>> Finally a fix patch pushed into v17.08:
>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>>> device file")
>>>>
>>>> Later a regression report sent related to the pushed patch:
>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html
>>>>
>>>> And a fix for regression integrated into v17.11-rc1:
>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html
>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>>> VM")
>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
>>>>
>>>> Even after the fix qede PMD reported to be broken:
>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html
>>>>
>>>> So this patch reverts original fix and related commits. The related
>>>> igb_uio code part turns back to v17.05 base.
>>> [...]
>>>> ---
>>>> It would be nice to solve this issue in LTS release, but being close to
>>>> the release and the error report without details makes it hard to work
>>>> more on this issue.
>>>
>>> With this revert, we are back to the original issue.
>>> We must really try the proposed solution.
>>> Harish, please describe your issue and think how it could be fixed.
>>> Jingjing made it work for i40e.
>>> I know it is less effort to request a simple revert.
>>> Please let's try to fix it once for all.
>>
>> Hi Ferruh/Thomas,
>> I’m discussing it internally, so please hold on committing this patch till
>> I revert back to you.
>>
>> Thanks.
> 
>> [Harish]
> 
>> 1) With the introduction of:
>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>> device file”)
>    We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
> mode.
> 
>>
>> PF PCI passthru mode initialization failure was resolved by:
>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>> VM")

Thank you for the update.

> 
>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and
>> related device cleanup is not completed by the time VF driver starts
>> loading. It results in the mbox command failure sent over the HW channel
>> between VF and PF.

This seems same reason why i40e added a check and wait loop.

>>
>> Even though pci_reset_function() waits for the stipulated amount of time
>> per standards, VF FLR takes longer than that and pci_reset_function() &
>> igb_uio_open() call returns before FLR completes and VF PMD driver tries
>> to load before FLR completes leading to VF PMD initialization failure.
>>
>>
>> We can work around this problem by adding driver delay/retry logic since
>> there is no deterministic way of detecting FLR completion. But this is
>> going to increase the driver load time.
>>
> 
>> 2) With the above patch ("igb_uio: issue FLR during open and release of
>> device file), FLR is going to be issued unconditionally on all devices
>> during igb_uio_open. We think it’s an over kill. FLR is required only for
>> previous abnormal app termination. We already handle the abnormal app
>> termination by doing necessary cleanup in the driver during load. This
>> cleanup is more efficient as it is done only when required. So we feel
>> that the drivers/devices needing such cleanup (the two cases listed
>> below)  should do it conditionally when required rather than
>> igb_uio_open() unconditionally performing FLR all the time.
>>
>> e.g.,
>    - cdb166963cae ("net/liquidio: add API for VF FLR”)

Both 1) and 2) related to the pci_reset during open().
But the main functionality we are looking for is the pci_reset in release().
So we can remove reset during open() [1].

Will disabling pci_reset in open() [2] solve your problems?

Jingjing, Jianfeng,
What do you think?


[1]
Perhaps will need to revert or partially revert liquidio patch after this.

[2]
disable line "pci_reset_function(dev);" in the igbuio_pci_open()
http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339

>>
>>
>> - http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Thanks,
> Harish
>>
> 



More information about the dev mailing list