[dpdk-dev] Unable to get RSS to work in testpmd and load balancing question

Dan Kan dan at nyansa.com
Thu Jan 9 00:24:38 CET 2014


I'm evaluating DPDK using dpdk-1.5.1r1. I have been playing around with the
test-pmd sample app. I'm having a hard time to get RSS to work. I have a
2-port 82599 Intel X540-DA2 NIC. I'm running the following command to start
the app.

sudo ./testpmd -c 0x1f -n 2 -- -i --portmask=0x3 --nb-cores=4 --rxq=4
--txq=4

I have a packet generator that sends udp packets with various src IP.
According testpmd, I'm only receiving packets in port 0's queue 0. Packets
are not going into any other queues. I have attached the output from
testpmd.


  ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0
-------
  RX-packets: 1000000        TX-packets: 1000000        TX-dropped:
0
  ---------------------- Forward statistics for port 0
----------------------
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1
----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On a separate note, I also find that the CPU utilization using 1 forwarding
core for 2 ports seems to be better (in the aggregate sense) than using 2
forwarding cores for 2 ports. Running at 10gbps line rate of pktlen=400,
with 1 core, the core's utilization is 40%. With 2 cores, each core's
utilization would 30%, giving an aggregate of 60%.

I have a use case of only doing rxonly packet processing. From my initial
test, it seems that it's more efficient to have a single core read packets
from both ports, and distribute the packet using rte_ring instead of having
each core read from its port. The rte_eth_rx operations appear to be much
CPU intensive than rte_ring_dequeue operations.

Thanks in advance.

Dan


More information about the dev mailing list