[PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
Rahul Bhansali
rbhansali at marvell.com
Fri Mar 6 13:19:01 CET 2026
Please see inline.
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson at intel.com>
> Sent: Monday, October 20, 2025 2:17 PM
> To: Stephen Hemminger <stephen at networkplumber.org>
> Cc: Morten Brørup <mb at smartsharesystems.com>; dev at dpdk.org; Thomas Monjalon <thomas at monjalon.net>; Konstantin Ananyev
> <konstantin.ananyev at huawei.com>; Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>; Ivan Malov
> <ivan.malov at arknetworks.am>; Chengwen Feng <fengchengwen at huawei.com>
> Subject: Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
>
> On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger wrote: > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025 at 06: 30: 02AM +0000, Morten Brørup
> On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger wrote:
> > On Thu, 9 Oct 2025 18:15:12 +0100
> > Bruce Richardson <mailto:bruce.richardson at intel.com> wrote:
> >
> > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > > An optimized function for resetting a bulk of newly allocated
> > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > >
> > > > Compared to the normal packet mbuf reset function, it takes advantage of
> > > > the following two details:
> > > > 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> > > > has been omitted.
> > > > 2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
> > > > mbuf uses an external buffer, and the 'data_off' field must not exceed the
> > > > data room size when resetting the data offset to include the default
> > > > headroom.
> > > > Unlike the normal packet mbuf reset function, which reads the mbuf itself
> > > > to get the information required for resetting these two fields, this
> > > > function gets the information from the mempool.
> > > >
> > > > This makes the function write-only of the mbuf, unlike the normal packet
> > > > mbuf reset function, which is read-modify-write of the mbuf.
> > > >
> > > > Signed-off-by: Morten Brørup <mailto:mb at smartsharesystems.com>
> > > > ---
> > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
> > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > index 49c93ab356..6f37a2e91e 100644
> > > > --- a/lib/mbuf/rte_mbuf.h
> > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > @@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > (uint16_t)m->buf_len);
> > > > }
> > > >
> > > > +/**
> > > > + * Reset the fields of a bulk of packet mbufs to their default values.
> > > > + *
> > > > + * The caller must ensure that the mbufs come from the specified mempool,
> > > > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
[Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will not be reset to default values by HW.
When packets are coming from wire, we reset these fields in Rx fastpath, but in case of SW allocated mbuf, we cannot do it in Marvell's mempool driver as that is unaware of mbuf.
Is it possible to reset these also in rte_mbuf_raw_reset_bulk() itself for mbuf alloc requests ?
> > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > + *
> > > > + * This function should be used with care, when optimization is required.
> > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > + *
> > > > + * @param mp
> > > > + * The mempool to which the mbuf belongs.
> > > > + * @param mbufs
> > > > + * Array of pointers to packet mbufs.
> > > > + * The array must not contain NULL pointers.
> > > > + * @param count
> > > > + * Array size.
> > > > + */
> > > > +static inline void
> > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> > > > +{
> > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
> > > > + uint16_t);
> > > > +
> > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > + struct rte_mbuf *m = mbufs[idx];
> > > > +
> > > > + m->pkt_len = 0;
> > > > + m->tx_offload = 0;
> > > > + m->vlan_tci = 0;
> > > > + m->vlan_tci_outer = 0;
> > > > + m->port = RTE_MBUF_PORT_INVALID;
> > >
> > > Have you considered doing all initialization using 64-bit stores? It's
> > > generally cheaper to do a single 64-bit store than e.g. set of 16-bit ones.
> > > This also means that we could remove the restriction on having refcnt and
> > > nb_segs already set. As in PMDs, a single store can init data_off, ref_cnt,
> > > nb_segs and port.
> > >
> > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss fields
> > > etc. For max performance, the whole of the mbuf cleared here can be done in
> > > 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly the
> > > compiler can even opportunistically coalesce more stores, so we could even
> > > end up getting 128-bit or larger stores depending on the ISA compiled for.
> > > [Maybe the compiler will do this even if they are not in order, but I'd
> > > like to maximize my chances here! :-)]
> > >
> > > /Bruce
> >
> > Although it is possible to use less CPU instructions, the performance
> > limiting factor is which fields are in cache.
>
> Yes, the cache presence of the target of the stores has a massive effect on
> how well the code will perform. However, the number of stores can make a
> difference too - especially if you are in store-heavy code. Consider the
> number of store operations which would be generated by storing
> field-by-field to a burst of 32 packets. With the previous work we have
> done on our PMDs, and vectorizing them, we got a noticible benefit from
> doing larger vector stores compared to smaller ones!
>
> /Bruce
More information about the dev
mailing list