[dpdk-users] Low TX performance on Mellanox ConnectX-3 NIC

Jesper Wramberg jesper.wramberg at gmail.com
Sat Oct 31 09:54:04 CET 2015


Hi all,



I am experiencing some performance issues in a somewhat custom setup with
two Mellanox ConnectX-3 NICs. I realize these issues might be due to the
setup, but I was hoping someone might be able to pinpoint some possible
problems/bottlenecks.




The server:

I have a Dell PowerEdge R630 with two Mellanox ConnectX-3 NICs (one on each
socket). I have a minimal Centos 7.1.1503 installed with kernel-3.10.0-229.
Note that this kernel is re-build with most things disabled to minimize
size, etc. It has infiniband enabled, however, and mlx4_core as a module
(since nothing works otherwise). Finally, I have connected the two NICs
from port 2 to port 2.



The firmware:

I have installed the latest firmware for the NICs from dell which is
2.34.5060.



The drivers, modules, etc.:

I have downloaded the Mellanox OFED package 3.1 for Centos 7.1 and used its
rebuild feature to build it against the custom kernel. I have installed it
using the --basic option since I just want libibverbs, libmlx4, kernel
modules and openibd service stuff. The mlx4_core.conf is set for ethernet
on all ports. Moreover, it is configured for flow steering mode -7 and a
few VFs. I can restart the openibd service successfully and everything
seems to be working. ibdev2netdev reports the NICs and its VFs, etc. The
only problems I have encountered at this stage is that the links doesn't
always seem to come up unless I unplug and re-plug the cables.



DPDK setup:

I have built DPDK with the mlx4 pmd using the .h/.a files from the OFED
package. I build it using the default values for everything. Running the
simple hello world example I can see that everything is initialized
correctly, etc.



Test setup:

To test the performance of the NICs I have the following setup. Two
processes, P1 and P2, running on NIC A. Two other processes, P3 and P4,
running on NIC B. All processes use virtual functions on their respective
NICs. Depending on the test, the processes can either transmit or receive
data. To transmit, I use a simple DPDK program which generates 32000
packets and transmits them over and over until it has sent 640 million
packets. Similarly, I use a simple DPDK program to receive which is
basically the layer 2 forwarding example without re-transmission.



First test:

In my first test, P1 transmits data to P3 while the other processes are
idle.

Packet size: 1480 byte packets

Flow control: On/Off, doesn’t matter I get same result.

Result: P3 receive all packets but it takes 192.52 seconds ~ 3.32 Mpps ~
4.9Gbit/s



Second test:

I my second test, I attempt to increase the amount of data transmitted over
NIC A. As such, P1 transmits data to P3 while P2 transmits data to P4.

Packet size: 1480 byte packets

Flow control: On/Off, doesn’t matter I get same result.

Results: P3 and P4 receive all packets but it takes 364.40 seconds ~ 1.75
Mpps ~ 2.6Gbit/s for a single process to get its data transmitted.





Does anyone has any idea what I am doing wrong here ? In the second test I
would expect P1 to transmit with the same speed as in the first test. It
seems that there is a bottleneck somewhere, however. I have left most
things to their default values but have also tried tweaking queue sizes,
number of queues, interrupts, etc. with no luck





Best Regards,

Jesper


More information about the users mailing list