[dpdk-dev] IXGBE RX packet loss with 5+ cores

Stephen Hemminger stephen at networkplumber.org
Tue Oct 13 07:18:30 CEST 2015


On Tue, 13 Oct 2015 02:57:46 +0000
"Sanford, Robert" <rsanford at akamai.com> wrote:

> I'm hoping that someone (perhaps at Intel) can help us understand
> an IXGBE RX packet loss issue we're able to reproduce with testpmd.
> 
> We run testpmd with various numbers of cores. We offer line-rate
> traffic (~14.88 Mpps) to one ethernet port, and forward all received
> packets via the second port.
> 
> When we configure 1, 2, 3, or 4 cores (per port, with same number RX
> queues per port), there is no RX packet loss. When we configure 5 or
> more cores, we observe the following packet loss (approximate):
>  5 cores - 3% loss
>  6 cores - 7% loss
>  7 cores - 11% loss
>  8 cores - 15% loss
>  9 cores - 18% loss
> 
> All of the "lost" packets are accounted for in the device's Rx Missed
> Packets Count register (RXMPC[0]). Quoting the datasheet:
>  "Packets are missed when the receive FIFO has insufficient space to
>  store the incoming packet. This might be caused due to insufficient
>  buffers allocated, or because there is insufficient bandwidth on the
>  IO bus."
> 
> RXMPC, and our use of API rx_descriptor_done to verify that we don't
> run out of mbufs (discussed below), lead us to theorize that packet
> loss occurs because the device is unable to DMA all packets from its
> internal packet buffer (512 KB, reported by register RXPBSIZE[0])
> before overrun.
> 
> Questions
> =========
> 1. The 82599 device supports up to 128 queues. Why do we see trouble
> with as few as 5 queues? What could limit the system (and one port
> controlled by 5+ cores) from receiving at line-rate without loss?
> 
> 2. As far as we can tell, the RX path only touches the device
> registers when it updates a Receive Descriptor Tail register (RDT[n]),
> roughly every rx_free_thresh packets. Is there a big difference
> between one core doing this and N cores doing it 1/N as often?
> 
> 3. Do CPU reads/writes from/to device registers have a higher priority
> than device reads/writes from/to memory? Could the former transactions
> (CPU <-> device) significantly impede the latter (device <-> RAM)?
> 
> Thanks in advance for any help you can provide.

As you add cores, there is more traffic on the PCI bus from each core
polling. There is a fix number of PCI bus transactions per second possible.
Each core is increasing the number of useless (empty) transactions.
Why do you think adding more cores will help?




More information about the dev mailing list