[dpdk-dev] IXGBE RX packet loss with 5+ cores

Bruce Richardson bruce.richardson at intel.com
Tue Oct 13 15:59:55 CEST 2015


On Mon, Oct 12, 2015 at 10:18:30PM -0700, Stephen Hemminger wrote:
> On Tue, 13 Oct 2015 02:57:46 +0000
> "Sanford, Robert" <rsanford at akamai.com> wrote:
> 
> > I'm hoping that someone (perhaps at Intel) can help us understand
> > an IXGBE RX packet loss issue we're able to reproduce with testpmd.
> > 
> > We run testpmd with various numbers of cores. We offer line-rate
> > traffic (~14.88 Mpps) to one ethernet port, and forward all received
> > packets via the second port.
> > 
> > When we configure 1, 2, 3, or 4 cores (per port, with same number RX
> > queues per port), there is no RX packet loss. When we configure 5 or
> > more cores, we observe the following packet loss (approximate):
> >  5 cores - 3% loss
> >  6 cores - 7% loss
> >  7 cores - 11% loss
> >  8 cores - 15% loss
> >  9 cores - 18% loss
> > 
> > All of the "lost" packets are accounted for in the device's Rx Missed
> > Packets Count register (RXMPC[0]). Quoting the datasheet:
> >  "Packets are missed when the receive FIFO has insufficient space to
> >  store the incoming packet. This might be caused due to insufficient
> >  buffers allocated, or because there is insufficient bandwidth on the
> >  IO bus."
> > 
> > RXMPC, and our use of API rx_descriptor_done to verify that we don't
> > run out of mbufs (discussed below), lead us to theorize that packet
> > loss occurs because the device is unable to DMA all packets from its
> > internal packet buffer (512 KB, reported by register RXPBSIZE[0])
> > before overrun.
> > 
> > Questions
> > =========
> > 1. The 82599 device supports up to 128 queues. Why do we see trouble
> > with as few as 5 queues? What could limit the system (and one port
> > controlled by 5+ cores) from receiving at line-rate without loss?
> > 
> > 2. As far as we can tell, the RX path only touches the device
> > registers when it updates a Receive Descriptor Tail register (RDT[n]),
> > roughly every rx_free_thresh packets. Is there a big difference
> > between one core doing this and N cores doing it 1/N as often?
> > 
> > 3. Do CPU reads/writes from/to device registers have a higher priority
> > than device reads/writes from/to memory? Could the former transactions
> > (CPU <-> device) significantly impede the latter (device <-> RAM)?
> > 
> > Thanks in advance for any help you can provide.
> 
> As you add cores, there is more traffic on the PCI bus from each core
> polling. There is a fix number of PCI bus transactions per second possible.
> Each core is increasing the number of useless (empty) transactions.
> Why do you think adding more cores will help?
>
The polling for packets by the core should not be using PCI bandwidth directly,
as the ixgbe driver (and other drivers) check for the DD bit being set on the
descriptor in memory/cache. However, using an increased number of queues can
use PCI bandwidth in other ways, for instance, with more queues you reduce the
amount of descriptor coalescing that can be done by the NICs, so that instead of
having a single transaction of 4 descriptors to one queue, the NIC may instead
have to do 4 transactions each writing 1 descriptor to 4 different queues. This
is possibly why sending all traffic to a single queue works ok - the polling on
the other queues is still being done, but has little effect.

Regards,
/Bruce


More information about the dev mailing list