[EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
Ferruh Yigit
ferruh.yigit at amd.com
Mon Apr 15 19:54:21 CEST 2024
On 4/12/2024 5:50 PM, Alan Elder wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit at amd.com>
>> Sent: Friday, April 12, 2024 6:23 AM
>> To: Alan Elder <alan.elder at microsoft.com>; Long Li <longli at microsoft.com>;
>> Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>
>> Cc: dev at dpdk.org; stephen <stephen at networkplumber.org>
>> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 4/11/2024 9:45 PM, Alan Elder wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit at amd.com>
>>>> Sent: Thursday, April 11, 2024 7:38 AM
>>>> To: Alan Elder <alan.elder at microsoft.com>; Long Li
>>>> <longli at microsoft.com>; Andrew Rybchenko
>>>> <andrew.rybchenko at oktetlabs.ru>
>>>> Cc: dev at dpdk.org; stephen <stephen at networkplumber.org>
>>>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
>>>> Rx queues
>>>>
>>>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>>>> The previous code allowed the number of Tx queues to be set higher
>>>>> than the number of Rx queues. If a packet was sent on a Tx queue
>>>>> with index
>>>>>> = number Rx queues there was a segfault.
>>>>> This commit fixes the issue by creating an Rx queue for every Tx
>>>>> queue meaning that an event buffer is allocated to handle receiving
>>>>> Tx completion messages.
>>>>>
>>>>> mbuf pool and Rx ring are not allocated for these additional Rx
>>>>> queues and RSS configuration ensures that no packets are received on
>> them.
>>>>>
>>>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>>>> Cc: sthemmin at microsoft.com
>>>>> Cc: stable at dpdk.org
>>>>>
>>>>> Signed-off-by: Alan Elder <alan.elder at microsoft.com>
>>>>>
>>>>
>>>> Hi Alan,
>>>>
>>>> What is the root cause of the crash, is it in driver scope or application?
>>>
>>> Hi Ferruh,
>>>
>>> The root cause of the crash was in the driver - a packet received on a Tx
>> queue that had no corresponding Rx queue would cause the dev->data-
>>> rx_queues[] array to be accessed past the length of the array.
>>>
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>>
>> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
>> _rxtx.
>>>
>> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
>> 4a64
>>>
>> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
>> 3848514
>>>
>> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
>> iV2luMzI
>>>
>> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
>> cSm6g0
>>> dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
>>>
>>>
>>
>> Why there is an access to Rx queue when processing Tx queue?
>>
>> A backtrace of the crash can help to understand the issue, can you please
>> include this in commit log, plus some explanation why crash happens?
>>
>> Thanks,
>> Ferruh
>
> Hi Ferruh,
>
> Netvsc slow path needs to handle Tx completion messages (to know when it can reclaim Tx buffers). Tx completion messages are received on Rx queue, which is why the Rx queue is accessed as part of transmit processing.
>
> An example call stack is:
>
> #6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
> #7 hn_process_events (hv=, queue_id=2, tx_limit=) at /drivers/net/netvsc/hn_rxtx.c
> #8 hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c
>
> Which leads to the SEGV as 0x20 is not a valid address.
>
> I'll update the commit messages and resubmit the patch.
>
>
Hi Alan,
Thanks for the detail.
'hn_xmit_pkts()' calls 'hn_process_events()' with the Tx queue_id, but
'hn_process_events()' seems designed for Rx event processing since it
uses 'queue_id' to get rxq.
Does it help to pass queue type to 'hn_process_events()'?
And the patch creates Rx queues for access Tx queues. Are the Tx
completion packets needs to be delivered to Rx queue with exact same Tx
queue_id by design?
Or the new Rx queues created just to prevent the crash, by providing
'rxq->ring_lock' etc?
Please also check comments on v4, thanks.
More information about the dev
mailing list