[dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G
Cliff Burdick
shaklee3 at gmail.com
Fri Mar 1 04:07:25 CET 2019
That's definitely interesting. Hopefully someone from mellanox can comment
on the performance impact since I haven't seen it qualified.
On Thu, Feb 28, 2019, 18:57 Arvind Narayanan <webguru2688 at gmail.com> wrote:
>
> On Thu, Feb 28, 2019, 8:23 PM Cliff Burdick <shaklee3 at gmail.com> wrote:
>
>> What size packets are you using? I've only steered to 2 rx queues by IP
>> dst match, and was able to hit 100Gbps. That's with a 4KB jumboframe.
>>
>
> 64 bytes. Agreed this is small, what seems interesting is l3fwd is able to
> handle 64B but rte_flow suffers (a lot) - suggesting offloading is
> expensive?!
>
> I'm doing something similar, steering to different queues based off
> dst_ip. However, my tests have around 80 rules, each rule steering to one
> of the 20 rx_queues. I have a one-to-one rx_queue-to-core_id mapping.
>
> Arvind
>
>
>
>> On Thu, Feb 28, 2019, 17:42 Arvind Narayanan <webguru2688 at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G
>>> EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64).
>>> Packet generator: t-rex 2.49 running on another machine.
>>>
>>> I am able to achieve 100G line rate with l3fwd application (fr sz 64B)
>>> using the parameters suggested in their performance report.
>>> (
>>> https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf
>>> )
>>>
>>> However, as soon as I install rte_flow rules to steer packets to
>>> different queues and/or use rte_flow's mark action, the throughput
>>> reduces to ~41G. I also modified DPDK's flow_filtering example
>>> application, and am getting the same reduced throughput of around 41G
>>> out of 100G. But without rte_flow, it goes to 100G.
>>>
>>> I didn't change any OS/Kernel parameters to test l3fwd or the
>>> application that uses rte_flow. I also ensure the application is
>>> numa-aware and use 20 cores to handle 100G traffic.
>>>
>>> Upon further investigation (using Mellanox NIC counters), the drop in
>>> throughput is due to mbuf allocation errors.
>>>
>>> Is such performance degradation normal when performing hw-acceleration
>>> using rte_flow?
>>> Has anyone tested throughput performance using rte_flow @ 100G?
>>>
>>> Its surprising to see hardware offloading is degrading the
>>> performance, unless I am doing something wrong.
>>>
>>> Thanks,
>>> Arvind
>>>
>>
More information about the users
mailing list