[dpdk-dev] [RFC V2] ethdev: fix issue that dev close in PMD calls twice

Ferruh Yigit ferruh.yigit at intel.com
Tue Aug 24 16:42:06 CEST 2021


On 8/19/2021 4:45 AM, Huisong Li wrote:
> 在 2021/8/18 19:24, Ferruh Yigit 写道:
>> On 8/13/2021 9:16 AM, Huisong Li wrote:
>>> 在 2021/8/13 14:12, Thomas Monjalon 写道:
>>>> 13/08/2021 04:11, Huisong Li:
>>>>> Hi, all
>>>>>
>>>>> This patch can enhance the security of device uninstallation to
>>>>> eliminate dependency on user usage methods.
>>>>>
>>>>> Can you check this patch?
>>>>>
>>>>>
>>>>> 在 2021/8/3 10:30, Huisong Li 写道:
>>>>>> Ethernet devices in DPDK can be released by rte_eth_dev_close() and
>>>>>> rte_dev_remove(). These APIs both call xxx_dev_close() in PMD layer
>>>>>> to uninstall hardware. However, the two APIs do not have explicit
>>>>>> invocation restrictions. In other words, at the ethdev layer, it is
>>>>>> possible to call rte_eth_dev_close() before calling rte_dev_remove()
>>>>>> or rte_eal_hotplug_remove(). In such a bad scenario,
>>>> It is not a bad scenario.
>>>> If there is no more port for the device after calling close,
>>>> the device should be removed automatically.
>>>> Keep in mind "close" is for one port, "remove" is for the entire device
>>>> which can have more than one port.
>>> I know.
>>>
>>> dev_close() is for removing an eth device. And rte_dev_remove() can be used
>>>
>>> for removing the rte device and all its eth devices belonging to the rte device.
>>>
>>> In rte_dev_remove(), "remove" is executed in primary or one of secondary,
>>>
>>> all eth devices having same pci address will be closed and removed.
>>>
>>>>>> the primary
>>>>>> process may be fine, but it may cause that xxx_dev_close() in the PMD
>>>>>> layer will be called twice in the secondary process. So this patch
>>>>>> fixes it.
>>>> If a port is closed in primary, it should be the same in secondary.
>>>>
>>>>
>>>>>> +    /*
>>>>>> +     * The eth_dev->data->name doesn't be cleared by the secondary process,
>>>>>> +     * so above "eth_dev" isn't NULL after rte_eth_dev_close() called.
>>>> This assumption is not clear. All should be closed together.
>>> However, dev_close() does not have the feature similar to rte_dev_remove().
>>>
>>> Namely, it is not guaranteed that all eth devices are closed together in ethdev
>>> layer. It depends on app or user.
>>>
>>> If the app does not close together, the operation of repeatedly uninstalling an
>>> eth device in the secondary process
>>>
>>> will be triggered when dev_close() is first called by one secondary process, and
>>> then rte_dev_remove() is called.
>>>
>>> So I think it should be avoided.
>> First of all, I am not sure about calling 'rte_eth_dev_close()' or
>> 'rte_dev_remove()' from the secondary process.
>> There are explicit checks in various locations to prevent clearing resources
>> completely from secondary process.
> 
> There's no denying that.
> 
> Generally, hardware resources of eth device and shared data of the primary and
> secondary process
> 
> are cleared by primary, which are controled by ethdev layer or PMD layer.
> 
> But there may be some private data or resources of each process (primary or
> secondary ), such as mp action
> 
> registered by rte_mp_action_register() or others.  For these resources, the
> secondary process still needs to clear.
> 
> Namely, both primary and secondary processes need to prevent repeated offloading
> of resources.
> 
>>
>> Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary is technically
>> can be done but application needs to be extra cautious and should take extra
>> measures and synchronization to make it work.
>> Regular use-case is secondary processes do the packet processing and all control
>> commands run by primary.
> 
> You are right. We have a consensus that 'rte_eth_dev_close()' or 'rte_dev_remove()'
> 
> can be called by primary and secondary processes.
> 
> But DPDK framework cannot assume user behavior.😁
> 
> We need to make it more secure and reliable for both primary and secondary
> processes.
> 
>>
>> In primary, if you call 'rte_eth_dev_close()' it will clear all ethdev resources
>> and further 'rte_dev_remove()' call will detect missing ethdev resources and
>> won't try to clear them again.
>>
>> In secondary, if you call 'rte_eth_dev_close()', it WON'T clear all resources
>> and further 'rte_dev_remove()' call (either from primary or secondary) will try
>> to clean ethdev resources again. You are trying to prevent this retry in remove
>> happening for secondary process.
> 
> Right. However, if secondary process in PMD layer has its own private resources
> to be
> 
> cleared, it still need to do it by calling 'rte_eth_dev_close()' or
> 'rte_dev_remove()'.
> 
>> In secondary it won't free ethdev resources anyway if you let it continue, but I
>> guess here you are trying to prevent the PMD dev_close() called again. Why? Is
>> it just for optimization or does it cause unexpected behavior in the PMD?
>>
>>
>> Overall, to free resources you need to do the 'rte_eth_dev_close()' or
>> 'rte_dev_remove()' in the primary anyway. So instead of this workaround, I would
>> suggest making PMD dev_close() safe to be called multiple times (if this is the
>> problem.)
> 
> In conclusion,  primary and secondary processes in PMD layer may have their own
> 
> private data and resources, which need to be processed and released.
> 
> Currently,  these for PMD are either handled and cleaned up in dev_close() or
> remove().
> 
> However, code streams in rte_dev_remove() cannot ensure that the uninstallation
> 
> from secondary process will not be repeated if rte_eth_dev_close() is first
> called by
> 
> secondary(primary is ok, plese review this patch).
> 
> I think, this is the same for each PMD and is better suited to doing it in
> ethdev layer.
> 

This patch prevents to call dev_close() twice in the secondary process, is this
fixing a theoretical problem or an actual problem?

If it is an actual problem can you please provide details, callstack of the
problematic case?

>>
>> And again, please re-consider calling 'rte_eth_dev_close()' or
>> 'rte_dev_remove()' from the secondary process.
>>
>>>>>> +     * Namely, whether "eth_dev" is NULL cannot be used to determine whether
>>>>>> +     * an ethdev port has been released.
>>>>>> +     * For both primary process and secondary process, eth_dev->state is
>>>>>> +     * RTE_ETH_DEV_UNUSED, which means the ethdev port has been released.
>>>>>> +     */
>>>>>> +    if (eth_dev->state == RTE_ETH_DEV_UNUSED) {
>>>>>> +        RTE_ETHDEV_LOG(INFO, "The ethdev port has been released.");
>>>>>> +        return 0;
>>>>>> +    }
>>>>
>>>> .
>> .



More information about the dev mailing list