[PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
Morten Brørup
mb at smartsharesystems.com
Sun Oct 19 20:45:26 CEST 2025
> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> Sent: Sunday, 19 October 2025 18.59
>
> 09/10/2025 19:35, Morten Brørup:
> > > From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > > + m->pkt_len = 0;
> > > > + m->tx_offload = 0;
> > > > + m->vlan_tci = 0;
> > > > + m->vlan_tci_outer = 0;
> > > > + m->port = RTE_MBUF_PORT_INVALID;
> > >
> > > Have you considered doing all initialization using 64-bit stores?
> It's
> > > generally cheaper to do a single 64-bit store than e.g. set of 16-
> bit
> > > ones.
> >
> > The code is basically copy-paste from rte_pktmbuf_reset().
> > I kept it the same way for readability.
> >
> > > This also means that we could remove the restriction on having
> refcnt
> > > and
> > > nb_segs already set. As in PMDs, a single store can init data_off,
> > > ref_cnt,
> > > nb_segs and port.
> >
> > Yes, I have given the concept a lot of thought already.
> > If we didn't require mbufs residing in the mempool to have any fields
> initialized, specifically "next" and "nb_segs", it would improve
> performance for drivers freeing mbufs back to the mempool, because
> writing to the mbufs would no longer be required at that point; the
> mbufs could simply be freed back to the mempool. Instead, we would
> require the driver to initialize these fields - which it probably does
> on RX anyway, if it supports segmented packets.
> > But I consider this concept a major API change, also affecting
> applications assuming that these fields are initialized when allocating
> raw mbufs from the mempool. So I haven't pursued it.
> >
> > >
> > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and
> rss
> > > fields
> > > etc. For max performance, the whole of the mbuf cleared here can be
> > > done in
> > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> possibly
> > > the
> > > compiler can even opportunistically coalesce more stores, so we
> could
> > > even
> > > end up getting 128-bit or larger stores depending on the ISA
> compiled
> > > for.
> > > [Maybe the compiler will do this even if they are not in order, but
> I'd
> > > like to maximize my chances here! :-)]
>
> Morten, you didn't reply to this.
> Can we optimize more with big stores?
>
I did reply:
https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35F654BB@smartserver.smartshare.dk/
Essentially, it's a different type of optimization, which should also be applied to rte_pktmbuf_reset().
I have postponed such optimization for later. Don't have time to do it properly now.
-Morten
More information about the dev
mailing list