[PATCH] net/i40e: Fast release optimizations
Konstantin Ananyev
konstantin.ananyev at huawei.com
Thu Jul 3 10:12:48 CEST 2025
> > I am talking about different thing:
> > I think with some extra effort driver can use (in some cases)
> > rte_mbuf_raw_free_bulk() even when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > is not specified.
> > Let say we can make txq->fast_free_mp[] an array with the same size as txq-
> > >txep[].
> > At tx_burst() when filling txep[] we can do pre_free() checks for that mbuf,
> > and in case of success store it's mempool pointer in corresponding txq-
> > >fast_free_mp[],
> > otherwise put NULL there.
> > Then at tx_free() we can scan fast_free_mp[] and invoke raw_free() for non-
> > NULL entries.
> > Again, for now it is just an idea probably worth to think about.
>
> Yes, that seems like an excellent idea, certainly worth considering!
>
> At tx_free(), the mbufs might be cold, so not accessing them at this point improves performance. (Which is also the point of my
> patch.)
Yes.
>
> At tx_burst(), the mbufs are read anyway (their information is written into the tx descriptors), so the mbufs are hot in the cache at
> this point.
Yes.
> Best case with your suggestion, rte_pktmbuf_prefree_seg() doesn't write the mbuf, so the performance cost of doing it at tx_burst()
> is extremely low.
Yes.
> Worst case with your suggestion, rte_pktmbuf_prefree_seg() does write the mbuf, so the mbuf write operation simply moves from
> tx_free() to tx_burst().
> However, in tx_burst(), the mbuf is already hot in the cache, so per transmitted mbuf, we get one load+store at tx_burst() instead of
> one load at tx_burst() + one load+store at tx_free().
I suppose you plan to invoke full rte_pktmbuf_prefree_seg() here?
Unfortunately, I don't think it is possible - for cases when refcnt > 1, we need to decrement refcnt only when we are ready to
release the mbuf. Otherwise we can end up with NIC HW reading from already released (and probably re-used) mbuf.
What we probably need is a lightweight version of rte_pktmbuf_prefree_seg() that would return not-NULL value only when
refcnt==1, and segment and not indirect mbuf or external memory attached.
Something like:
static __rte_always_inline struct rte_mbuf *
rte_pktmbuf_prefree_check(sconst truct rte_mbuf *m)
{
if (rte_mbuf_refcnt_read(m) == 1 && RTE_MBUF_DIRECT(m))
return m;
return NULL;
}
So at worst case (when such check will return NULL) we still need to do load+store at tx_free().
More information about the dev
mailing list