[dpdk-dev] IXGBE RX packet loss with 5+ cores

Bruce Richardson bruce.richardson at intel.com
Wed Oct 14 11:29:52 CEST 2015


On Tue, Oct 13, 2015 at 01:24:22PM -0700, Alexander Duyck wrote:
> On 10/13/2015 07:47 AM, Sanford, Robert wrote:
> >>>>[Robert:]
> >>>>1. The 82599 device supports up to 128 queues. Why do we see trouble
> >>>>with as few as 5 queues? What could limit the system (and one port
> >>>>controlled by 5+ cores) from receiving at line-rate without loss?
> >>>>
> >>>>2. As far as we can tell, the RX path only touches the device
> >>>>registers when it updates a Receive Descriptor Tail register (RDT[n]),
> >>>>roughly every rx_free_thresh packets. Is there a big difference
> >>>>between one core doing this and N cores doing it 1/N as often?
> >>>[Stephen:]
> >>>As you add cores, there is more traffic on the PCI bus from each core
> >>>polling. There is a fix number of PCI bus transactions per second
> >>>possible.
> >>>Each core is increasing the number of useless (empty) transactions.
> >>[Bruce:]
> >>The polling for packets by the core should not be using PCI bandwidth
> >>directly,
> >>as the ixgbe driver (and other drivers) check for the DD bit being set on
> >>the
> >>descriptor in memory/cache.
> >I was preparing to reply with the same point.
> >
> >>>[Stephen:] Why do you think adding more cores will help?
> >We're using run-to-completion and sometimes spend too many cycles per pkt.
> >We realize that we need to move to io+workers model, but wanted a better
> >understanding of the dynamics involved here.
> >
> >
> >
> >>[Bruce:] However, using an increased number of queues can
> >>use PCI bandwidth in other ways, for instance, with more queues you
> >>reduce the
> >>amount of descriptor coalescing that can be done by the NICs, so that
> >>instead of
> >>having a single transaction of 4 descriptors to one queue, the NIC may
> >>instead
> >>have to do 4 transactions each writing 1 descriptor to 4 different
> >>queues. This
> >>is possibly why sending all traffic to a single queue works ok - the
> >>polling on
> >>the other queues is still being done, but has little effect.
> >Brilliant! This idea did not occur to me.
> 
> You can actually make the throughput regression disappear by altering the
> traffic pattern you are testing with.  In the past I have found that sending
> traffic in bursts where 4 frames belong to the same queue before moving to
> the next one essentially eliminated the dropped packets due to PCIe
> bandwidth limitations.  The trick is you need to have the Rx descriptor
> processing work in batches so that you can get multiple descriptors
> processed for each PCIe read/write.
>
Yep, that's one test we used to prove the effect on descriptor coalescing, and
it does work a treat! Unfortunately, I think controlling real-world input traffic
that way, could be, ... em ... challenging? :-)

/Bruce


More information about the dev mailing list