[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Thomas Monjalon thomas at monjalon.net
Mon Nov 2 16:58:08 CET 2020


+Cc techboard

We need benchmark numbers in order to take a decision.
Please all, prepare some arguments and numbers so we can discuss
the mbuf layout in the next techboard meeting.


01/11/2020 21:59, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > Sent: Sunday, November 1, 2020 5:38 PM
> > 
> > 01/11/2020 10:12, Morten Brørup:
> > > One thing has always puzzled me:
> > > Why do we use 64 bits to indicate which memory pool
> > > an mbuf belongs to?
> > > The portid only uses 16 bits and an indirection index.
> > > Why don't we use the same kind of indirection index for mbuf pools?
> > 
> > I wonder what would be the cost of indirection. Probably neglectible.
> 
> Probably. The portid does it, and that indirection is heavily used everywhere.
> 
> The size of mbuf memory pool indirection array should be compile time configurable, like the size of the portid indirection array.
> 
> And for reference, the indirection array will fit into one cache line if we default to 8 mbuf pools, thus supporting an 8 CPU socket system with one mbuf pool per CPU socket, or a 4 CPU socket system with two mbuf pools per CPU socket.
> 
> (And as a side note: Our application is optimized for single-socket systems, and we only use one mbuf pool. I guess many applications were developed without carefully optimizing for multi-socket systems, and also just use one mbuf pool. In these cases, the mbuf structure doesn't really need a pool field. But it is still there, and the DPDK libraries use it, so we didn't bother removing it.)
> 
> > I think it is a good proposal...
> > ... for next year, after a deprecation notice.
> > 
> > > I can easily imagine using one mbuf pool (or perhaps a few pools)
> > > per CPU socket (or per physical memory bus closest to an attached NIC),
> > > but not more than 256 mbuf memory pools in total.
> > > So, let's introduce an mbufpoolid like the portid,
> > > and cut this mbuf field down from 64 to 8 bits.

We will need to measure the perf of the solution.
There is a chance for the cost to be too much high.


> > > If we also cut down m->pkt_len from 32 to 24 bits,
> > 
> > Who is using packets larger than 64k? Are 16 bits enough?
> 
> I personally consider 64k a reasonable packet size limit. Exotic applications with even larger packets would have to live with this constraint. But let's see if there are any objections. For reference, 64k corresponds to ca. 44 Ethernet (1500 byte) packets.
> 
> (The limit could be 65535 bytes, to avoid translation of the value 0 into 65536 bytes.)
> 
> This modification would go nicely hand in hand with the mbuf pool indirection modification.
> 
> ... after yet another round of ABI stability discussions, depreciation notices, and so on. :-)

After more thoughts, I'm afraid 64k is too small in some cases.
And 24-bit manipulation would probably break performance.
I'm afraid we are stuck with 32-bit length.


> > > we can get the 8 bit mbuf pool index into the first cache line
> > > at no additional cost.
> > 
> > I like the idea.
> > It means we don't need to move the pool pointer now,
> > i.e. it does not have to replace the timestamp field.
> 
> Agreed! Don't move m->pool to the first cache line; it is not used for RX.
> 
> > 
> > > In other words: This would free up another 64 bit field in the mbuf
> > structure!
> > 
> > That would be great!
> > 
> > 
> > > And even though the m->next pointer for scattered packets resides
> > > in the second cache line, the libraries and application knows
> > > that m->next is NULL when m->nb_segs is 1.
> > > This proves that my suggestion would make touching
> > > the second cache line unnecessary (in simple cases),
> > > even for re-initializing the mbuf.
> > 
> > So you think the "next" pointer should stay in the second half of mbuf?
> > 
> > I feel you would like to move the Tx offloads in the first half
> > to improve performance of very simple apps.
> 
> "Very simple apps" sounds like a minority of apps. I would rather say "very simple packet handling scenarios", e.g. forwarding of normal size non-segmented packets. I would guess that the vast majority of packets handled by DPDK applications actually match this scenario. So I'm proposing to optimize for what I think is the most common scenario.
> 
> If segmented packets are common, then m->next could be moved to the first cache line. But it will only improve the pure RX steps of the pipeline. When preparing the packet for TX, m->tx_offloads will need to be set, and the second cache line comes into play. So I'm wondering how big the benefit of having m->next in the first cache line really is - assuming that m->nb_segs will be checked before accessing m->next.
> 
> > I am thinking the opposite: we could have some dynamic fields space
> > in the first half to improve performance of complex Rx.
> > Note: we can add a flag hint for field registration in this first half.
> > 
> 
> I have had the same thoughts. However, I would prefer being able to forward ordinary packets without using the second mbuf cache line at all (although only in specific scenarios like my example above).
> 
> Furthermore, the application can abuse the 64 bit m->tx_offload field for private purposes until it is time to prepare the packet for TX and pass it on to the driver. This hack somewhat resembles a dynamic field in the first cache line, and will not be possible if the m->pool or m->next field is moved there.





More information about the dev mailing list