[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Thomas Monjalon thomas at monjalon.net
Sun Nov 1 17:38:14 CET 2020


01/11/2020 10:12, Morten Brørup:
> One thing has always puzzled me:
> Why do we use 64 bits to indicate which memory pool
> an mbuf belongs to?
> The portid only uses 16 bits and an indirection index.
> Why don't we use the same kind of indirection index for mbuf pools?

I wonder what would be the cost of indirection. Probably neglectible.
I think it is a good proposal...
... for next year, after a deprecation notice.

> I can easily imagine using one mbuf pool (or perhaps a few pools)
> per CPU socket (or per physical memory bus closest to an attached NIC),
> but not more than 256 mbuf memory pools in total.
> So, let's introduce an mbufpoolid like the portid,
> and cut this mbuf field down from 64 to 8 bits.
> 
> If we also cut down m->pkt_len from 32 to 24 bits,

Who is using packets larger than 64k? Are 16 bits enough?

> we can get the 8 bit mbuf pool index into the first cache line
> at no additional cost.

I like the idea.
It means we don't need to move the pool pointer now,
i.e. it does not have to replace the timestamp field.

> In other words: This would free up another 64 bit field in the mbuf structure!

That would be great!


> And even though the m->next pointer for scattered packets resides
> in the second cache line, the libraries and application knows
> that m->next is NULL when m->nb_segs is 1.
> This proves that my suggestion would make touching
> the second cache line unnecessary (in simple cases),
> even for re-initializing the mbuf.

So you think the "next" pointer should stay in the second half of mbuf?

I feel you would like to move the Tx offloads in the first half
to improve performance of very simple apps.
I am thinking the opposite: we could have some dynamic fields space
in the first half to improve performance of complex Rx.
Note: we can add a flag hint for field registration in this first half.


> And now I will proceed out on a tangent with two more
> independent thoughts, so feel free to ignore.
> 
> Consider a multi CPU socket system with one mbuf pool
> per CPU socket, the NICs attached to each CPU socket
> use an RX mbuf pool with RAM on the same CPU socket.
> I would imagine that (re-)initializing these mbufs could be faster
> if performed only on a CPU on the same socket.
> If this is the case, mbufs should be re-initialized
> as part of the RX preparation at ingress,
> not as part of the mbuf free at egress.
> 
> Perhaps some microarchitectures are faster to compare
> nb_segs==0 than nb_segs==1.
> If so, nb_segs could be redefined to mean number of
> additional segments, rather than number of segments.





More information about the dev mailing list