[dpdk-dev] rte_mbuf.next in 2nd cacheline

Thomas Monjalon thomas.monjalon at 6wind.com
Wed Jun 17 18:32:24 CEST 2015


2015-06-17 14:23, Damjan Marion:
> 
> > On 17 Jun 2015, at 16:06, Bruce Richardson <bruce.richardson at intel.com> wrote:
> > 
> > On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote:
> >> 
> >>> On 15 Jun 2015, at 16:12, Bruce Richardson <bruce.richardson at intel.com> wrote:
> >>> 
> >>> The next pointers always start out as NULL when the mbuf pool is created. The
> >>> only time it is set to non-NULL is when we have chained mbufs. If we never have
> >>> any chained mbufs, we never need to touch the next field, or even read it - since
> >>> we have the num-segments count in the first cache line. If we do have a multi-segment
> >>> mbuf, it's likely to be a big packet, so we have more processing time available
> >>> and we can then take the hit of setting the next pointer.
> >> 
> >> There are applications which are not using rx offload, but they deal with chained mbufs.
> >> Why they are less important than ones using rx offload? This is something people 
> >> should be able to configure on build time.
> > 
> > It's not that they are less important, it's that the packet processing cycle count
> > budget is going to be greater. A packet which is 64 bytes, or 128 bytes in size
> > can make use of a number of RX offloads to reduce it's processing time. However,
> > a 64/128 packet is not going to be split across multiple buffers [unless we
> > are dealing with a very unusual setup!].
> > 
> > To handle 64 byte packets at 40G line rate, one has 50 cycles per core per packet
> > when running at 3GHz. [3000000000 cycles / 59.5 mpps].
> > If we assume that we are dealing with fairly small buffers
> > here, and that anything greater than 1k packets are chained, we still have 626
> > cycles per 3GHz core per packet to work with for that 1k packet. Given that
> > "normal" DPDK buffers are 2k in size, we have over a thousand cycles per packet
> > for any packet that is split. 
> > 
> > In summary, packets spread across multiple buffers are large packets, and so have
> > larger packet cycle count budgets and so can much better absorb the cost of
> > touching a second cache line in the mbuf than a 64-byte packet can. Therefore,
> > we optimize for the 64B packet case.
> 
> This makes sense if there is no other work to do on the same core.
> Otherwise it is better to spent those cycles on actual work instead of waiting for 
> 2nd cache line...

You're probably right.
I wonder wether this flexibility can be implemented only in static lib builds?


More information about the dev mailing list