[dpdk-users] Explanation for poor performance of DPDK not found

Victor Huertas vhuertas at gmail.com
Mon Aug 27 17:21:03 CEST 2018


Dear colleagues,

I am seeing a strange behaviour in terms of performance when I run the L3
forwarding pipeline app example of DPDK.

The diagram is as simple as this:

PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
Gbps link--------> PC2

I have implemented a new pipeline which is performs ARP task in order to
configure the Routing type pipelines's table 1 (the one that performs MAC
translation from next-hop IP addr).

The first strange thing I see is that when I ping from PC 1 to PC 2, the
ping works but it is reporting me a delay of 19,9 ms. And also every ping
report (1 per second) reports a decreasing delay in 1 ms like this:
PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms

As you can see, there is a 1 ms decrease each ping report and suddenly
comes back to 19,9 ms

The second issue comes up when I send an 700 Mbps UDP stream (using iperf
v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
on reception.
[  4]  0.0-509.3 sec  1 datagrams received out-of-order
[  3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port 60184
[  3]  0.0- 5.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   39/311788
(0.013%)
[  3]  5.0-10.0 sec   437 MBytes   733 Mbits/sec   0.025 ms  166/311988
(0.053%)
[  3] 10.0-15.0 sec   437 MBytes   734 Mbits/sec   0.022 ms    0/312067
(0%)
[  3] 15.0-20.0 sec   437 MBytes   733 Mbits/sec   0.029 ms  151/311916
(0.048%)
[  3] 20.0-25.0 sec   437 MBytes   734 Mbits/sec   0.016 ms   30/311926
(0.0096%)
[  3] 25.0-30.0 sec   437 MBytes   734 Mbits/sec   0.022 ms  143/312118
(0.046%)
[  3] 30.0-35.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   20/311801
(0.0064%)
[  3] 35.0-40.0 sec   437 MBytes   733 Mbits/sec   0.020 ms  202/311857
(0.065%)
[  3] 40.0-45.0 sec   437 MBytes   733 Mbits/sec   0.017 ms  242/311921
(0.078%)
[  3] 45.0-50.0 sec   437 MBytes   733 Mbits/sec   0.021 ms  280/311890
(0.09%)
[  3] 50.0-55.0 sec   438 MBytes   734 Mbits/sec   0.019 ms    0/312119
(0%)
[  3] 55.0-60.0 sec   436 MBytes   732 Mbits/sec   0.018 ms  152/311339
(0.049%)
[  3] 60.0-65.0 sec   437 MBytes   734 Mbits/sec   0.017 ms  113/312048
(0.036%)
[  3] 65.0-70.0 sec   437 MBytes   733 Mbits/sec   0.023 ms  180/311756
(0.058%)
[  3] 70.0-75.0 sec   437 MBytes   734 Mbits/sec   0.020 ms    0/311960
(0%)
[  3] 75.0-80.0 sec   437 MBytes   734 Mbits/sec   0.013 ms  118/312060
(0.038%)
[  3] 80.0-85.0 sec   437 MBytes   734 Mbits/sec   0.019 ms  122/312060
(0.039%)
[  3] 85.0-90.0 sec   437 MBytes   733 Mbits/sec   0.025 ms   55/311904
(0.018%)
[  3] 90.0-95.0 sec   437 MBytes   733 Mbits/sec   0.024 ms  259/312002
(0.083%)
[  3]  0.0-97.0 sec  8.28 GBytes   733 Mbits/sec   0.034 ms 2271/6053089
(0.038%)

Sometimes I even see packet disorder report from the iperf receipt part.

I didn't expect such a performance in terms of delay and throughput and I
would link to find an explanation. That's why I need your help.

Allow me to tell you some particularities of the machine that runs the DPDK
application and the environment which could help us explain this behaviour.


   1. When I run the application I running using the "Debug" environment of
   the Eclipse in Linux OpenSuse 42.3 Leap.
   2. The hugepages size in this machine is 2 MB
   3. 1024 hugepages has been reserved for the application
   4. lscpu displayed subsequently

cuda1 at cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Stepping:              5
CPU MHz:               2133.000
CPU max MHz:           2133.0000
CPU min MHz:           1600.0000
BogoMIPS:              4267.10
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
tpr_shadow vnmi flexpriority ept vpid

5. Routing pipeline is executed using core 1. Master pipeline is executed
using core 0 and new ARP pipeline is executed using core 2.

6. The two NICs I am using seems not to be assigned to any NUMA node

cuda1 at cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.0/numa_node
-1
cuda1 at cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.1/numa_node
-1

7. According to the pipeline ROUTING statistics (regarding table 0 and
table 1) very few miss drops at table 0 are reported and do not coincide at
all with the ones reported by iperf (iperf drops are much higher than the
table 0 and table 1 drops) and also the links used in the application do
not report any drop at all.

So where are these packets dropped?

Any of you have an idea if this particularities from my PC can justify this
behaviour?

I need to find an answer to this because I expected a much better
performance according to the DPDK performance expectations.

Thanks for your attention

Victor

-- 
Victor


More information about the users mailing list