[PATCH] net/null: Add fast mbuf release TX offload
Varghese, Vipin
Vipin.Varghese at amd.com
Mon Jul 28 10:22:20 CEST 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Snipped
> > [Public]
> >
> > Hi Morten,
> >
> > We have tested the effect of the patch using func-latency and PPs via
> > testpmd.
> > Please find our observations below
> >
> > - DPDK tag: 25.07-rc1
> > - compiler: gcc 14.2
> > - platform: AMD EPYC 8534P 64core 2.3GHz
> > - app cmd:
> > -- One port: ` sudo build/app/dpdk-testpmd -l 15,16 --vdev=net_null1
> > - -no-pci -- --nb-cores=1 --nb-ports=1 --txq=1 --rxq=1 --txd=2048 --
> > rxd=2048 -a --forward-mode=io --stats-period=1`
> > -- Two port: ` sudo build/app/dpdk-testpmd -l 15,16,17 --
> > vdev=net_null1 --vdev=net_null2 --no-pci -- --nb-cores=2 --nb-ports=2
> > -
> > -txq=1 --rxq=1 --txd=2048 --rxd=2048 -a --forward-mode=io --stats-
> > period=1`
> >
> > Result 1 port:
> > - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX:
> > 1918ns, Func-latency free-bulk: 2667ns
> > - After patch: TX MPPs 117.55, RX-PPs 117.54, Func-latency TX:
> > 1921ns, Func-latency free-bulk: 2660ns
> >
> > Result 2 port:
> > - Before patch: TX MPPs 117.61, RX-PPs 117.67, Func-latency TX:
> > 1942ns, Func-latency free-bulk: 2557ns
> > - After patch: TX MPPs 117.54, RX-PPs 117.54, Func-latency TX:
> > 1946ns, Func-latency free-bulk: 2740ns
> >
> > Perf Top: diff before vs after shows 13.84% vs 13.79%
> >
> > Reviewed-by: Thiyagarjan P <Thiyagarajan.P at amd.com>
> > Tested-by: Vipin Varghese <Vipin.Varghese at amd.com>
>
> Thank you for reviewing and testing.
>
> >
> > Clarification request: `with fast-mbuf-free on single port we see
> > free- bulk reduction by -7ns. But null_tx increase by +3ns. TX PPs
> > reduction by 0.07 Mpps. Is this anomaly of null_net PMD?`
>
> I have finally found the bug in my patch:
> It announces device-level capability for FAST_FREE, but ignores device-level
> FAST_FREE configuration, and uses queue-level FAST_FREE configuration
> instead.
>
> Due to this bug, your testing probably shows the performance of the non-
> FAST_FREE code path.
> The added comparison for FAST_FREE (code path not taken) might explain the
> null_tx +3ns increase.
>
> I will send a v2 patch.
Will check
>
> >
> > > >
> > > > On Tue, 24 Jun 2025 18:14:16 +0000 Morten Brørup
> > > > <mb at smartsharesystems.com> wrote:
> > > >
> > > > > Added fast mbuf release, re-using the existing mbuf pool pointer
> > in
> > > > > the queue structure.
> > > > >
> > > > > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > > >
> > > > Makes sense.
> > > >
> > > > > ---
> > > > > drivers/net/null/rte_eth_null.c | 30
> > > > > +++++++++++++++++++++++++++-
> > --
> > > > > 1 file changed, 27 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/null/rte_eth_null.c
> > > > b/drivers/net/null/rte_eth_null.c
> > > > > index 8a9b74a03b..12c0d8d1ff 100644
> > > > > --- a/drivers/net/null/rte_eth_null.c
> > > > > +++ b/drivers/net/null/rte_eth_null.c
> > > > > @@ -34,6 +34,17 @@ struct pmd_internals; struct null_queue {
> > > > > struct pmd_internals *internals;
> > > > >
> > > > > + /**
> > > > > + * For RX queue:
> > > > > + * Mempool to allocate mbufs from.
> > > > > + *
> > > > > + * For TX queue:
> > > > > + * Mempool to free mbufs to, if fast release of mbufs is
> > enabled.
> > > > > + * UINTPTR_MAX if the mempool for fast release of mbufs has
> > not
> > > > yet been detected.
> > > > > + * NULL if fast release of mbufs is not enabled.
> > > > > + *
> > > > > + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > > > > + */
> > > > > struct rte_mempool *mb_pool;
> > > >
> > > > Do all drivers to it this way?
> > >
> > > No, I think most drivers have separate structures for rx and tx
> > queues. This driver
> > > doesn't so I'm reusing the existing mempool pointer.
> > > Also, they don't cache the mempool pointer, but look at mbuf[0].pool
> > at every burst;
> > > so their tx queue structure doesn't have a mempool pointer field.
> > > And they check an offload flag (either the bit in the raw offload
> > field, or a shadow
> > > variable for the relevant offload flag), instead of checking the
> > mempool pointer.
> > >
> > > Other drivers can be improved, and I have submitted an optimization
> > patch for the
> > > i40e driver with some of the things I do in this patch:
> > > https://inbox.dpdk.org/dev/20250624061238.89259-1-
> > > mb at smartsharesystems.com/
> > >
> > > > Is it documented in ethdev?
> > >
> > > The RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE flag is documented.
> > > How to implement it in the drivers is not.
> > >
> > > -Morten
More information about the dev
mailing list