[dpdk-dev] [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and free mbuf in rx path

Ferruh Yigit ferruh.yigit at intel.com
Fri Nov 13 18:40:45 CET 2020


On 10/14/2020 1:15 PM, Li,Rongqing wrote:
> 
> 
>> -----Original Message-----
>> From: Loftus, Ciara [mailto:ciara.loftus at intel.com]
>> Sent: Friday, October 02, 2020 12:24 AM
>> To: Li,Rongqing <lirongqing at baidu.com>
>> Cc: dev at dpdk.org
>> Subject: RE: [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and free
>> mbuf in rx path
>>
>>>
>>> when receive packets, the max bunch number of mbuf are allocated if
>>> hardware does not receive the max bunch number packets, it will free
>>> redundancy mbuf, that is low-performance
>>>
>>> so optimize rx performance, by allocating number of mbuf based on
>>> result of xsk_ring_cons__peek, to avoid to redundancy allocation, and
>>> free mbuf when receive packets
>>
>> Hi,
>>
>> Thanks for the patch and fixing the issue I raised.
> 
> Thanks for your finding
> 
>> With my testing so far I haven't measured an improvement in performance
>> with the patch.
>> Do you have data to share which shows the benefit of your patch?
>>
>> I agree the potential excess allocation of mbufs for the fill ring is not the most
>> optimal, but if doing it does not significantly impact the performance I would be
>> in favour of keeping that approach versus touching the cached_cons outside of
>> libbpf which is unconventional.
>>
>> If a benefit can be shown and we proceed with the approach, I would suggest
>> creating a new function for the cached consumer rollback eg.
>> xsk_ring_cons_cancel() or similar, and add a comment describing what it does.
>>
> 
> Thanks for your test.
> 
> Yes, it has benefit
> 
> We first see this issue when do some send performance, topo is like below
> 
> Qemu with vhost-user ----->ovs------->xdp interface
> 
> Qemu sends udp packets, xdp has not packets to receive, but it must be polled by ovs, and xdp must allocated/free mbuf unnecessary, with this packet, we has about 5% benefit for sending, this depends on flow table complexity
> 
> 
> When do rx benchmark, if packets per batch is reaching about 32, the benefit is very little.
> If packets per batch is far less than 32, we can see the cycle per packet is reduced obviously
> 

Hi Li, Ciara,

What is the status of this patch, is the patch justified and is a new versions 
requested/expected?



More information about the dev mailing list