[dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen

Venkatesan, Venky venky.venkatesan at intel.com
Sun Sep 22 19:41:43 CEST 2013


Chris, 

The numbers you are getting are correct. :)

Practically speaking, most motherboards pin out between 4 and 5 x8 slots to every CPU socket. At PCI-E Gen 2 speeds (5 GT/s), each slot is capable of carrying 20 Gb/s of traffic  (limited to ~16 Gb/s of 64B packets). I would have expected the 64-byte  traffic capacity to be a bit higher than 80 Gb/s, but either way the numbers you are achieving are well within the capability of the system if you are careful about pinning cores to ports, which you seem to be doing. QPI is not a limiter either for the amount of traffic you are generating currently. 

Regards,
-Venky

-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chris Pappas
Sent: Sunday, September 22, 2013 7:32 AM
To: dev at dpdk.org
Subject: [dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen

Hi,

We have a question about the performance numbers we are getting measured through the pktgen application provided by Wind River Systems. The current setup is the following:

We have two machines, each equipped with 6 dual-port 10 GbE NICs (with a total of 12 ports). Machine 0 runs DPDK L2FWD code, and Machine 1 runs Wind River System's pktgen. L2FWD is modified to forward the incoming packets to other statically assigned output port.

Our machines have two Intel Xeon E5-2600 CPUs connected via QPI, and has two riser slots each having three 10Gbps NICs. Two NICS in riser slot 1
(NIC0 and NIC1) is connected to CPU 1 via PCIe Gen3, while the remaining
NIC2 is connected to CPU2 also via PCIe Gen3. In riser slot 2, all NICs (NICs 3,4, and 5) are connected to CPU2 via PCIe Gen3. We were careful to assign the NIC ports to cores of CPU sockets that have direct physical connection to achieve max performance.


With this setup, we are getting 120 Gbps throughput measured by pktgen with packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps.
Do these performance numbers make sense? We are reading related papers in this domain, and seems like our numbers are unusually high. We did our theoretical calculation and find that it should theoretically be possible because it does not hit the PCIe bandwidth or our machine, nor does it exceed QPI bandwidth when packets are forwarded over the NUMA node. Can you share your thoughts / experience with this?

Thank you,

Chris


More information about the dev mailing list