[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Morten Brørup mb at smartsharesystems.com
Tue Nov 3 15:03:50 CET 2020


> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: Tuesday, November 3, 2020 2:50 PM
> 
> On Tue, Nov 03, 2020 at 02:46:17PM +0100, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > Sent: Tuesday, November 3, 2020 1:26 PM
> > >
> > > On Tue, Nov 03, 2020 at 01:10:05PM +0100, Morten Brørup wrote:
> > > > > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > > > > Sent: Monday, November 2, 2020 4:58 PM
> > > > >
> > > > > +Cc techboard
> > > > >
> > > > > We need benchmark numbers in order to take a decision.
> > > > > Please all, prepare some arguments and numbers so we can
> discuss
> > > > > the mbuf layout in the next techboard meeting.
> > > >
> > > > I propose that the techboard considers this from two angels:
> > > >
> > > > 1. Long term goals and their relative priority. I.e. what can be
> > > > achieved with wide-ranging modifications, requiring yet another
> ABI
> > > > break and due notices.
> > > >
> > > > 2. Short term goals, i.e. what can be achieved for this release.
> > > >
> > > >
> > > > My suggestions follow...
> > > >
> > > > 1. Regarding long term goals:
> > > >
> > > > I have argued that simple forwarding of non-segmented packets
> using
> > > > only the first mbuf cache line can be achieved by making three
> > > > modifications:
> > > >
> > > > a) Move m->tx_offload to the first cache line.
> > > > b) Use an 8 bit pktmbuf mempool index in the first cache line,
> > > >    instead of the 64 bit m->pool pointer in the second cache
> line.
> > > > c) Do not access m->next when we know that it is NULL.
> > > >    We can use m->nb_segs == 1 or some other invariant as the
> gate.
> > > >    It can be implemented by adding an m->next accessor function:
> > > >    struct rte_mbuf * rte_mbuf_next(struct rte_mbuf * m)
> > > >    {
> > > >        return m->nb_segs == 1 ? NULL : m->next;
> > > >    }
> > > >
> > > > Regarding the priority of this goal, I guess that simple
> forwarding
> > > > of non-segmented packets is probably the path taken by the
> majority
> > > > of packets handled by DPDK.
> > > >
> > > >
> > > > An alternative goal could be:
> > > > Do not touch the second cache line during RX.
> > > > A comment in the mbuf structure says so, but it is not true
> anymore.
> > > >
> > >
> > > The comment should be true for non-scattered RX, I believe.
> >
> > You are correct.
> >
> > My suggestion was unclear: Extend this remark to include segmented
> packets.
> >
> > This could be a priority if the techboard considers RX segmented
> packets more important than my suggestion for single cache line
> forwarding of non-segmented packets.
> >
> >
> > > I'm not aware of any use of second cacheline for the fast-path RXs
> for many drivers.
> > > Am I missing something that has changed recently here?
> >
> > Check out eth_igb_recv_pkts() in the E1000 driver: rxm->next = NULL;
> > Or pmd_rx_burst() in the TAP driver: new_tail->next = seg->next;
> >
> > Perhaps the documentation should describe best practices for
> implementing RX and TX functions in drivers, including
> allocating/freeing mbufs. Or an example dummy Ethernet driver could do
> it.
> >
> 
> Yes, perhaps I should be clearer about the "fast-path", because I was
> thinking of the optimized RX/TX paths for those nics at 10G and above.
> Probably the documentation should indeed have an update clarifying
> things a
> bit, since using the first cacheline only possible but not mandatory
> for
> simple RX.

I sometimes look at the source code of the simple drivers for reference, as they are easier to understand than the advanced vector drivers.
I suppose new PMD developers also would. :-)

Anyway, it is probably a good idea to add a clarifying note to the documentation, thus reflecting reality.

Just make sure that it says that the second cache line is supposed to be untouched by RX of high performance drivers, so application developers still consider it cold.



More information about the dev mailing list