[PATCH] net/i40e: Fast release optimizations
Morten Brørup
mb at smartsharesystems.com
Tue Jul 1 11:09:01 CEST 2025
> From: Konstantin Ananyev [mailto:konstantin.ananyev at huawei.com]
> Sent: Tuesday, 1 July 2025 10.16
[...]
> I am talking about different thing:
> I think with some extra effort driver can use (in some cases)
> rte_mbuf_raw_free_bulk() even when RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> is not specified.
> Let say we can make txq->fast_free_mp[] an array with the same size as txq-
> >txep[].
> At tx_burst() when filling txep[] we can do pre_free() checks for that mbuf,
> and in case of success store it's mempool pointer in corresponding txq-
> >fast_free_mp[],
> otherwise put NULL there.
> Then at tx_free() we can scan fast_free_mp[] and invoke raw_free() for non-
> NULL entries.
> Again, for now it is just an idea probably worth to think about.
Yes, that seems like an excellent idea, certainly worth considering!
At tx_free(), the mbufs might be cold, so not accessing them at this point improves performance. (Which is also the point of my patch.)
At tx_burst(), the mbufs are read anyway (their information is written into the tx descriptors), so the mbufs are hot in the cache at this point.
Best case with your suggestion, rte_pktmbuf_prefree_seg() doesn't write the mbuf, so the performance cost of doing it at tx_burst() is extremely low.
Worst case with your suggestion, rte_pktmbuf_prefree_seg() does write the mbuf, so the mbuf write operation simply moves from tx_free() to tx_burst().
However, in tx_burst(), the mbuf is already hot in the cache, so per transmitted mbuf, we get one load+store at tx_burst() instead of one load at tx_burst() + one load+store at tx_free().
More information about the dev
mailing list