[dpdk-dev] DPDK L2fwd benchmark with 64byte packets
chrispappas12 at gmail.com
Fri Jan 24 17:30:19 CET 2014
we are benchmarking DPDK l2fwd performance by using DPDK Pktgen (both up to
date). We have connected two server machines back-to-back, and each machine
is a dual-socket server with 6 dual-port 10G NICs (12 ports in total with
120 Gbps). Four of the NICs (8 ports in total) are connected to socket 0
and the other two (4 ports in total) are connected to socket 1. With 1500
byte packets we saturate line rate, however, with 64 byte packets we do not.
By running l2fwd (./l2fwd -c 0xff0f -n 4 -- -p 0xfff) we get following
performance reported by Pktgen:
7386/9808 7386/9807 7413/9837 7413/9827 7397/9816 7397/9822
7400/9823 7400/9823 7394/9820 7394/9807 7372/9768 7372/9788
L2fwd reports 0 dropped packets in total.
Another observation is that Pktgen does not saturate exactly the line rate
as for 1500 byte packets we observe exactly 10 Gbps Tx.
* The way the coremask (-c) works is quite clear (for our case the 4 LSB
are cores of socket 0, the next 4 LSB of socket 1, then socket 0 and socket
1 again). However, the port mask only defines which NICs are enabled and we
would like to know how do we ensure that the cores that are assigned to the
NICs are on the same socket as the corresponding NICs, or is this done
The command we use to run l2fwd is the following:
./l2fwd -c 0xff0f -n 4 -- -p 0xfff
* The next observation is that if we run again l2fwd with a different
coremask and enable all our cores (./l2fwd -c 0xffff -n 4 -- -p 0xfff),
performance drops significantly, and results are the following:
7380/9807 7380/9806 7422/9850 7423/9789 2467/9585 2467/9624
1399/9809 1399/9806 7391/9816 7392/9802 7370/9789 7370/9789
We observe that ports P4-P7 have a very low throughput, and they correspond
to the cores we enabled in the coremask. This result seems weird and make
the assignment of cores to NICs seem as a logical explanation. Moreover,
l2fwd reports many dropped packets only for these 4 NICs.
We would like to know if there is an obvious mistake in our configuration,
or if there are some steps we can take to debug this. 6Wind reports a
platform limit of 160 Mpps, but we are below this with a similar platform.
Is the bottleneck the PCIe?
Thank you in advance for your time.
More information about the dev