[dpdk-users] DPDK TX problems

Hrvoje Habjanić hrvoje.habjanic at zg.ht.hr
Mon Apr 8 11:52:46 CEST 2019


On 29/03/2019 08:24, Hrvoje Habjanić wrote:
>> Hi.
>>
>> I did write an application using dpdk 17.11 (did try also with 18.11),
>> and when doing some performance testing, i'm seeing very odd behavior.
>> To verify that this is not because of my app, i did the same test with
>> l2fwd example app, and i'm still confused by results.
>>
>> In short, i'm trying to push a lot of L2 packets through dpdk engine -
>> packet processing is minimal. When testing, i'm starting with small
>> number of packets-per-second, and then gradually increase it to see
>> where is the limit. At some point, i do reach this limit - packets start
>> to get dropped. And this is when stuff become weird.
>>
>> When i reach peek packet rate (at which packets start to get dropped), i
>> would expect that reducing packet rate will remove packet drops. But,
>> this is not the case. For example, let's assume that peek packet rate is
>> 3.5Mpps. At this point everything works ok. Increasing pps to 4.0Mpps,
>> makes a lot of dropped packets. When reducing pps back to 3.5Mpps, app
>> is still broken - packets are still dropped.
>>
>> At this point, i need to drastically reduce pps (1.4Mpps) to make
>> dropped packets go away. Also, app is unable to successfully forward
>> anything beyond this 1.4M, despite the fact that in the beginning it did
>> forward 3.5M! Only way to recover is to restart the app.
>>
>> Also, sometimes, the app just stops forwarding any packets - packets are
>> received (as seen by counters), but app is unable to send anything back.
>>
>> As i did mention, i'm seeing the same behavior with l2fwd example app. I
>> did test dpdk 17.11 and also dpdk 18.11 - the results are the same.
>>
>> My test environment is HP DL380G8, with 82599ES 10Gig (ixgbe) cards,
>> connected with Cisco nexus 9300 sw. On the other side is ixia test
>> appliance. Application is run in virtual machine (VM), using KVM
>> (openstack, with sriov enabled, and numa restrictions). I did check that
>> VM is using only cpu's from NUMA node on which network card is
>> connected, so there is no cross-numa traffic. Openstack is Queens,
>> Ubuntu is Bionic release. Virtual machine is also using ubuntu bionic
>> as OS.
>>
>> I do not know how to debug this? Does someone else have the same
>> observations?
>>
>> Regards,
>>
>> H.
> There are additional findings. It seems that when i reach peak pps
> rate, application is not fast enough, and i can see rx missed errors
> on card statistics on the host. At the same time, tx side starts to
> show problems (tx burst starts to show it did not send all packets).
> Shortly after that, tx falls apart completely and top pps rate drops.
>
> Since i did not disable pause frames, i can see on the switch "RX
> pause" frame counter is increasing. On the other hand, if i disable
> pause frames (on the nic of server), host driver (ixgbe) reports "TX
> unit hang" in dmesg, and issues card reset. Of course, after reset
> none of the dpdk apps in VM's on this host does not work.
>
> Is it possible that at time of congestion DPDK does not release mbufs
> back to the pool, and tx ring becomes "filled" with zombie packets
> (not send by card and also having ref counter as they are in use)?
>
> Is there a way to check mempool or tx ring for "left-owers"? Is is
> possible to somehow "flush" tx ring and/or mempool?
>
> H.

After few more test, things become even weirder - if i do not free mbufs
which are not sent, but resend them again, i can "survive" over-the-peek
event! But, then peek rate starts to drop gradually ...

I would ask if someone can try this on their platform and report back? I
would really like to know if this is problem with my deployment, or
there is something wrong with dpdk?

Test should be simple - use l2fwd or l3fwd, and determine max pps. Then
drive pps 30%over max, and then return back and confirm that you can
still get max pps.

Thanks in advance.

H.



More information about the users mailing list