[dpdk-users] Low Rx throughput when using Mellanox ConnectX-3 card with DPDK

Shahaf Shuler shahafs at mellanox.com
Thu Apr 13 07:19:55 CEST 2017


Thursday, April 13, 2017 4:58 AM, Shihabur Rahman Chowdhury:
[...]
> >>>
> >>>> setup
> >>>>
> >>>> - 2 machines with 2xIntel Xeon E5-2620 CPUs
> >>>> - Each machine with a Mellanox single port 10G ConnectX3 card
> >>>> - Mellanox DPDK version 16.11
> >>>> - Mellanox OFED 4.0-2.0.0.1 and latest firmware for ConnectX3
> >>>>
> >>>> The application is doing almost nothing. It is reading a batch of 64
> >>>> packets from a single rxq, swapping the mac of each packet and writing
> >>>> it
> >>>> back to a single txq. The rx and tx is being handled by separate lcores

Why did you choose such configuration? 
Such configuration may cause high overhead in snoop cycles, as the first cache line of the packet
Will first be on the Rx lcore and then it will need to be invalidated when the Tx lcore swaps the macs. 

Since you are using 2 cores anyway, have you tried that each core will do both Rx and Tx (run to completion)?

> >>>>
> >>> on
> >>>
> >>>> the same NUMA socket. We are running pktgen on another machine.
> With 64B
> >>>> sized packets we are seeing ~14.8Mpps Tx rate and ~7.3Mpps Rx rate in
> >>>> pktgen. We checked the NIC on the machine running the DPDK
> application
> >>>> (with ifconfig) and it looks like there is a large number of packets
> >>>>
> >>> being
> >>>
> >>>> dropped by the interface. 

This might be because the scenario is SW bound, when the application don't process the packets fast enough the NIC must drop the ingress.

>>>>>Our connectx3 card should be theoretically
> be
> >>>> able to handle 10Gbps Rx + 10Gbps Tx throughput (with channel width
> 4,
> >>>>
> >>> the
> >>>
> >>>> theoretical max on PCIe 3.0 should be ~31.2Gbps). Interestingly, when
> Tx
> >>>> rate is reduced in pktgent (to ~9Mpps), the Rx rate increases to
> ~9Mpps.
> >>>>
> >>> Not sure what is going on here, when you drop the rate to 9Mpps I
> assume
> >>> you stop getting missed frames.
> >>> Do you have flow control enabled?
> >>>
> >>> On the pktgen side are you seeing missed RX packets?
> >>> Did you loopback the cable from pktgen machine to the other port on
> the
> >>> pktgen machine and did you get the same Rx/Tx performance in that
> >>> configuration?
> >>>
> >>> We would highly appriciate if we could get some pointers as to what can
> >>>>
> >>> be
> >>>
> >>>> possibly causing this mismatch in Rx and Tx. Ideally, we should be able
> >>>>
> >>> to
> >>>
> >>>> see ~14Mpps Rx well. Is it because we are using a single port? Or

Our "Hero number" for testpmd application which do i/o forwarding with ConnectX-3 is ~10Mpps for single core.
Dual core should reach ~14Mpps.

> >>>>
> >>> something
> >>>
> >>>> else?
> >>>>
> >>>> FYI, we also ran the sample l2fwd application and test-pmd and got
> >>>> comparable results in the same setup.
> >>>>
> >>>> Thanks
> >>>> Shihab
> >>>>
> >>> Regards,
> >>> Keith
> >>>
> >>>
> >>>
> >


More information about the users mailing list