[PATCH v2] net/pcap: fix timeout of stopping device
Ferruh Yigit
ferruh.yigit at amd.com
Tue Nov 29 15:11:13 CET 2022
On 11/22/2022 9:25 AM, Zhou, YidingX wrote:
>
>
>> -----Original Message-----
>> From: Zhou, YidingX <yidingx.zhou at intel.com>
>> Sent: Wednesday, September 21, 2022 3:15 PM
>> To: Stephen Hemminger <stephen at networkplumber.org>; Zhang, Qi Z
>> <qi.z.zhang at intel.com>
>> Cc: dev at dpdk.org; Burakov, Anatoly <anatoly.burakov at intel.com>; He,
>> Xingguang <xingguang.he at intel.com>; stable at dpdk.org
>> Subject: RE: [PATCH v2] net/pcap: fix timeout of stopping device
>>
>>
>>
>>> -----Original Message-----
>>> From: Stephen Hemminger <mailto:stephen at networkplumber.org>
>>> Sent: Tuesday, September 6, 2022 10:58 PM
>>> To: Zhou, YidingX <mailto:yidingx.zhou at intel.com>
>>> Cc: mailto:dev at dpdk.org; Zhang, Qi Z <mailto:qi.z.zhang at intel.com>; Burakov, Anatoly
>>> <mailto:anatoly.burakov at intel.com>; He, Xingguang <mailto:xingguang.he at intel.com>;
>>> mailto:stable at dpdk.org
>>> Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device
>>>
>>> On Tue, 6 Sep 2022 16:05:11 +0800
>>> Yiding Zhou <mailto:yidingx.zhou at intel.com> wrote:
>>>
>>>> The pcap file will be synchronized to the disk when stopping the device.
>>>> It takes a long time if the file is large that would cause the
>>>> 'detach sync request' timeout when the device is closed under
>>>> multi-process scenario.
>>>>
>>>> This commit fixes the issue by using alarm handler to release dumper.
>>>>
>>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private")
>>>> Cc: mailto:stable at dpdk.org
>>>>
>>>> Signed-off-by: Yiding Zhou <mailto:yidingx.zhou at intel.com>
>>>
>>>
>>> I think you need to redesign the handshake if this the case.
>>> Forcing 30 second delay at the end of all uses of pcap is not acceptable.
>>
>> @Zhang, Qi Z Do we need to redesign the handshake to fix this?
>
> Hi, Ferruh
> Sorry for the late reply.
> I did not receive your email on Oct 6, I got your comments from patchwork.
>
> "Can you please provide more details on multi-process communication and
> call trace, to help us think about a solution to address this issue in a
> more generic way (not just for pcap but for any case device close takes
> more than multi-process timeout)?"
>
> I try to explain this issue with a sequence diagram, hope it can be displayed correctly in the mail.
>
> thread intr thread intr thread thread
> of secondary of secondary of primary of primary
> | | | |
> | | | |
> rte_eal_hotplug_remove
> rte_dev_remove
> eal_dev_hotplug_request_to_primary
> rte_mp_request_sync ------------------------------------------------------->|
> |
> handle_secondary_request
> |<-----------------|
> |
> __handle_secondary_request
> eal_dev_hotplug_request_to_secondary
> |<------------------------------------- rte_mp_request_sync
> |
> handle_primary_request--------->|
> |
> __handle_primary_request
> local_dev_remove(this will take long time)
> rte_mp_reply -------------------------------->|
> |
> local_dev_remove
> |<------------------------------------------------- rte_mp_reply
>
> The marked 'local_dev_remove()' in the secondary process will perform a pcap file synchronization operation.
> When the pcap file is too large, it will take a lot of time (according to my test 100G takes 20+ seconds).
> This caused the processing of hot_plug message to time out.
Hi Yiding,
Thanks for the information,
Right now all MP operations timeout is hardcoded in the code and it is 5
seconds.
Do you think does it work to have an API to set custom timeout,
something like `rte_mp_timeout_set()`, and call this from pdump?
This gives a generic solution for similar cases, not just for pcap.
But my concern is if this is too much multi-process related internal
detail to update, @Anatoly may comment on this.
More information about the dev
mailing list