[PATCH v2] net/memif: fix buffer overflow in zero copy Rx
Ferruh Yigit
ferruh.yigit at amd.com
Thu Oct 10 04:00:32 CEST 2024
On 8/31/2024 2:38 PM, Mihai Brodschi wrote:
> Hi Ferruh,
>
> Apologies for the late response.
> I've run some performance tests for the two proposed solutions.
> In the tables below, the rte_memcpy results correspond to this patch.
> The 2xpktmbuf_alloc results correspond to the other proposed solution.
>
> bash commands:
> server# ./dpdk-testpmd --vdev=net_memif0,id=1,role=server,bsize=1024,rsize=8 --single -l<SERVER_CORES> --file=test1 -- --nb-cores <NB_CORES> --txq <NB_CORES> --rxq <NB_CORES> --burst <BURST> -i
> client# ./dpdk-testpmd --vdev=net_memif0,id=1,role=client,bsize=1024,rsize=8,zero-copy=yes --single -l<CLIENT_CORES> --file=test2 -- --nb-cores <NB_CORES> --txq <NB_CORES> --rxq <NB_CORES> --burst <BURST> -i
>
> testpmd commands:
> client:
> testpmd> start
> server:
> testpmd> start tx_first
>
>
> CPU: AMD EPYC 7713P
> RAM: DDR4-3200
> OS: Debian 12
> DPDK: 22.11.1
> SERVER_CORES=72,8,9,10,11
> CLIENT_CORES=76,12,13,14,15
>
> Results:
> ==================================================================
> | | 1 CORE | 2 CORES | 4 CORES |
> ==================================================================
> | unpatched burst=32 | 9.95 Gbps | 19.24 Gbps | 36.4 Gbps |
> ------------------------------------------------------------------
> | 2xpktmbuf_alloc burst=32 | 9.86 Gbps | 18.88 Gbps | 36.6 Gbps |
> ------------------------------------------------------------------
> | 2xpktmbuf_alloc burst=31 | 9.17 Gbps | 18.69 Gbps | 35.1 Gbps |
> ------------------------------------------------------------------
> | rte_memcpy burst=32 | 9.54 Gbps | 19.10 Gbps | 36.6 Gbps |
> ------------------------------------------------------------------
> | rte_memcpy burst=31 | 9.39 Gbps | 18.53 Gbps | 35.5 Gbps |
> ==================================================================
>
>
> CPU: Intel Core i7-14700HX
> RAM: DDR5-5600
> OS: Ubuntu 24.04.1
> DPDK: 23.11.1
> SERVER_CORES=0,1,3,5,7
> CLIENT_CORES=8,9,11,13,15
>
> Results:
> ==================================================================
> | | 1 CORE | 2 CORES | 4 CORES |
> ==================================================================
> | unpatched burst=32 | 15.52 Gbps | 27.35 Gbps | 46.8 Gbps |
> ------------------------------------------------------------------
> | 2xpktmbuf_alloc burst=32 | 15.49 Gbps | 27.68 Gbps | 46.4 Gbps |
> ------------------------------------------------------------------
> | 2xpktmbuf_alloc burst=31 | 14.98 Gbps | 26.75 Gbps | 45.2 Gbps |
> ------------------------------------------------------------------
> | rte_memcpy burst=32 | 15.99 Gbps | 28.44 Gbps | 49.3 Gbps |
> ------------------------------------------------------------------
> | rte_memcpy burst=31 | 14.85 Gbps | 27.32 Gbps | 46.3 Gbps |
> ==================================================================
>
Hi Mihai,
Thank you for the extensive testing.
Problematic case is "burst=31", between '2xpktmbuf_alloc' & 'rte_memcpy'
method, there is small difference and not one of them consistently
better than other.
In this case I will proceed with current patch.
>
> On 19/07/2024 12:03, Ferruh Yigit wrote:
>> On 7/8/2024 12:45 PM, Ferruh Yigit wrote:
>>> On 7/8/2024 4:39 AM, Mihai Brodschi wrote:
>>>>
>>>>
>>>> On 07/07/2024 21:46, Mihai Brodschi wrote:
>>>>>
>>>>>
>>>>> On 07/07/2024 18:18, Mihai Brodschi wrote:
>>>>>>
>>>>>>
>>>>>> On 07/07/2024 17:05, Ferruh Yigit wrote:
>>>>>>>
>>>>>>> My expectation is numbers should be like following:
>>>>>>>
>>>>>>> Initially:
>>>>>>> size = 256
>>>>>>> head = 0
>>>>>>> tail = 0
>>>>>>>
>>>>>>> In first refill:
>>>>>>> n_slots = 256
>>>>>>> head = 256
>>>>>>> tail = 0
>>>>>>>
>>>>>>> Subsequent run that 32 slots used:
>>>>>>> head = 256
>>>>>>> tail = 32
>>>>>>> n_slots = 32
>>>>>>> rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots);
>>>>>>> head & mask = 0
>>>>>>> // So it fills first 32 elements of buffer, which is inbound
>>>>>>>
>>>>>>> This will continue as above, combination of only gap filled and head
>>>>>>> masked with 'mask' provides the wrapping required.
>>>>>>
>>>>>> If I understand correctly, this works only if eth_memif_rx_zc always processes
>>>>>> a number of packets which is a power of 2, so that the ring's head always wraps
>>>>>> around at the end of a refill loop, never in the middle of it.
>>>>>> Is there any reason this should be the case?
>>>>>> Maybe the tests don't trigger the crash because this condition holds true for them?
>>>>>
>>>>> Here's how to reproduce the crash on DPDK stable 23.11.1, using testpmd:
>>>>>
>>>>> Server:
>>>>> # ./dpdk-testpmd --vdev=net_memif0,id=1,role=server,bsize=1024,rsize=8 --single-file-segments -l2,3 --file-prefix test1 -- -i
>>>>>
>>>>> Client:
>>>>> # ./dpdk-testpmd --vdev=net_memif0,id=1,role=client,bsize=1024,rsize=8,zero-copy=yes --single-file-segments -l4,5 --file-prefix test2 -- -i
>>>>> testpmd> start
>>>>>
>>>>> Server:
>>>>> testpmd> start tx_first
>>>>> testpmt> set burst 15
>>>>>
>>>>> At this point, the client crashes with a segmentation fault.
>>>>> Before the burst is set to 15, its default value is 32.
>>>>> If the receiver processes packets in bursts of size 2^N, the crash does not occur.
>>>>> Setting the burst size to any power of 2 works, anything else crashes.
>>>>> After applying this patch, the crashes are completely gone.
>>>>
>>>> Sorry, this might not crash with a segmentation fault. To confirm the mempool is
>>>> corrupted, please compile DPDK with debug=true and the c_args -DRTE_LIBRTE_MEMPOOL_DEBUG.
>>>> You should see the client panic when changing the burst size to not be a power of 2.
>>>> This also works on the latest main branch.
>>>>
>>>
>>> Hi Mihai,
>>>
>>> Right, if the buffer size is not multiple of burst size, issue is valid.
>>> And as there is a requirement to have buffer size power of two, burst
>>> should have the same.
>>> I assume this issue is not caught before because default burst size is 32.
>>>
>>> Can you please share some performance impact of the change, with two
>>> possible solutions we discussed above?
>>>
>>> Other option is to add this as a limitation to the memif zero copy, but
>>> this won't be good for usability.
>>>
>>> We can decide based on performance numbers.
>>>
>>>
>>
>> Hi Jakup,
>>
>> Do you have any comment on this?
>>
>> I think we should either document this as limitation of the driver, or
>> fix it, and if so need to decide the fix.
>>
>
>
More information about the dev
mailing list