|SUCCESS| [v3] net/intel: optimize for fast-free hint

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Tue May 19 19:45:58 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164134

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-19

Reviewing commit for DPDK coding standards and correctness.

git sha: 20260519110637.964965-1-bruce.richardson at intel.com
Author: Bruce Richardson <bruce.richardson at intel.com>
Subject: [PATCH v3] net/intel: optimize for fast-free hint

This patch optimizes TX buffer cleanup when the fast-free hint is enabled
by resetting mbuf fields during descriptor writes and using bulk-free
operations to return mbufs to their mempool without individual refcount
checks.

> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 23a8c39cf2..5fe71aed12 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -329,13 +329,22 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
>  		return;
>  
>  	if (!txq->use_vec_entry) {
> -		/* Regular scalar path uses sw_ring with ci_tx_entry */
> -		for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
> -			if (txq->sw_ring[i].mbuf != NULL) {
> -				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
> -				txq->sw_ring[i].mbuf = NULL;
> -			}
> +		/* Free mbufs from (last_desc_cleaned + 1) to (tx_tail - 1). */
> +		const uint16_t start = (txq->last_desc_cleaned + 1) % txq->nb_tx_desc;
> +		const uint16_t nb_desc = txq->nb_tx_desc;
> +		const uint16_t end = txq->tx_tail;
> +
> +		uint16_t i = start;
> +		if (end < i) {
> +			for (; i < nb_desc; i++)
> +				if (txq->sw_ring[i].mbuf != NULL)
> +					rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
> +			i = 0;
>  		}
> +		for (; i < end; i++)
> +			if (txq->sw_ring[i].mbuf != NULL)
> +				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
> +		memset(txq->sw_ring, 0, sizeof(txq->sw_ring[0]) * nb_desc);

Does this code leak mbufs when end == start? If start == end (empty
range), both loops skip all iterations and no mbufs are freed, but the
memset() still clears all sw_ring pointers, potentially leaking mbufs
outside this range that were previously valid.

>  		return;
>  	}

[ ... ]

> diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
> index 9fcd2e4733..d27df34dfa 100644
> --- a/drivers/net/intel/common/tx_scalar.h
> +++ b/drivers/net/intel/common/tx_scalar.h
> @@ -197,16 +197,64 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
>  	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
>  			0 :
>  			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
> -	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
> +	const uint16_t dd_idx = txq->rs_last_id[rs_idx];
> +	const uint16_t first_to_clean = rs_idx << txq->log2_rs_thresh;
>  
> -	/* Check if descriptor is done  */
> -	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> -				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> +	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
> +	if ((txd[dd_idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> +		/* Descriptor not yet processed by hardware */
>  		return -1;
>  
> +	/* DD bit is set, descriptors are done. Now free the mbufs. */
> +	/* Note: nb_tx_desc is guaranteed to be a multiple of tx_rs_thresh,
> +	 * validated during queue setup. This means cleanup never wraps around
> +	 * the ring within a single burst (e.g., ring=256, rs_thresh=32 gives
> +	 * bursts of 0-31, 32-63, ..., 224-255).
> +	 */
> +	const uint16_t nb_to_clean = txq->tx_rs_thresh;
> +	struct ci_tx_entry *sw_ring = txq->sw_ring;
> +
> +	/* fast_free_mp is NULL only when the fast free is disabled*/
> +	if (txq->fast_free_mp != NULL) {
> +		/* FAST_FREE path: mbufs are already reset, just return to pool */
> +		struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
> +		uint16_t nb_free = 0;
> +
> +		/* Get cached mempool pointer, or cache it on first use */
> +		struct rte_mempool *mp =
> +			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
> +			txq->fast_free_mp :
> +			(txq->fast_free_mp = sw_ring[dd_idx].mbuf->pool);

Does this code crash when sw_ring[dd_idx].mbuf is NULL? The fast-free
path can encounter NULL mbuf pointers in context descriptor or TSO
segment slots, but this code dereferences ->pool unconditionally when
caching the mempool pointer for the first time.

> +
> +		/* Pack non-NULL mbufs in-place at start of sw_ring range.
> +		 * No modulo needed in loop since we're guaranteed not to wrap.
> +		 */
> +		for (uint16_t i = 0; i < nb_to_clean; i++) {
> +			struct rte_mbuf *m = sw_ring[first_to_clean + i].mbuf;
> +			if (m == NULL)
> +				continue;
> +			free[nb_free++] = m;
> +			if (unlikely(nb_free == CI_TX_MAX_FREE_BUF_SZ)) {
> +				rte_mbuf_raw_free_bulk(mp, free, nb_free);
> +				nb_free = 0;
> +			}
> +		}
> +
> +		/* Bulk return to mempool using packed sw_ring entries directly */
> +		if (nb_free > 0)
> +			rte_mbuf_raw_free_bulk(mp, free, nb_free);

Does this code violate the fast-free mbuf contract? The fast-free
offload guarantees all mbufs come from the same mempool, but this code
uses only the cached mp pointer without validating that each freed mbuf
actually belongs to it. If a NULL mbuf was encountered during caching
initialization, mp could remain uninitialized and corrupt the wrong
mempool.

[ ... ]

> @@ -450,8 +498,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  			txd = &ci_tx_ring[tx_id];
>  			tx_id = txe->next_id;
>  
> -			if (txe->mbuf)
> -				rte_pktmbuf_free_seg(txe->mbuf);
>  			txe->mbuf = tx_pkt;
>  			/* Setup TX Descriptor */
>  			td_cmd |= CI_TX_DESC_CMD_EOP;

Does this code leak mbufs in the non-fast-free path? The old code freed
any existing mbuf in txe->mbuf before overwriting it, but now it relies
entirely on ci_tx_xmit_cleanup() to handle freeing. If cleanup is
delayed or skipped, does this overwrite valid mbuf pointers without
freeing them?

[ ... ]

> @@ -471,11 +517,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
>  
>  			txn = &sw_ring[txe->next_id];
> -			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
> -			if (txe->mbuf) {
> -				rte_pktmbuf_free_seg(txe->mbuf);
> -				txe->mbuf = NULL;
> -			}
> +			txe->mbuf = NULL;
>  
>  			write_txd(ctx_txd, cd_qw0, cd_qw1);

Does this code leak the mbuf in txe->mbuf when a context descriptor is
inserted? The original code freed the old mbuf pointer before setting it
to NULL, but now it just overwrites without freeing.

[ ... ]

> @@ -507,10 +545,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  			txd = &ci_tx_ring[tx_id];
>  			txn = &sw_ring[txe->next_id];
>  
> -			if (txe->mbuf)
> -				rte_pktmbuf_free_seg(txe->mbuf);
>  			txe->mbuf = m_seg;
>  
> +			/* For FAST_FREE: reset mbuf fields while we have it in cache.
> +			 * [Fast free is indicated by txq->fast_free_mp being non-NULL.]
> +			 * FAST_FREE guarantees refcnt=1 and direct mbufs, so we only
> +			 * need to reset nb_segs and next pointer as per rte_pktmbuf_prefree_seg.
> +			 * Save next pointer before resetting since we need it for loop iteration.
> +			 */
> +			struct rte_mbuf *next_seg = m_seg->next;
> +			if (txq->fast_free_mp != NULL) {
> +				if (m_seg->nb_segs != 1)
> +					m_seg->nb_segs = 1;
> +				if (next_seg != NULL)
> +					m_seg->next = NULL;
> +			}

Does this code corrupt mbuf state prematurely? If hardware has not yet
completed transmitting a previously submitted packet that references
this mbuf, resetting nb_segs and next here could cause problems if the
hardware or other code paths inspect these fields before completion.

> +
>  			/* Setup TX Descriptor */
>  			/* Calculate segment length, using IPsec callback if provided */
>  			if (ipsec_ops != NULL)
> @@ -528,18 +578,23 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
>  					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
>  				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
> +				/* txe for this slot has already been written (e.g. above outside
> +				 * loop), so we write the extra NULL mbuf pointer for this
> +				 * descriptor after we increment txe below.
> +				 */
>  
>  				buf_dma_addr += CI_MAX_DATA_PER_TXD;
>  				slen -= CI_MAX_DATA_PER_TXD;
>  
>  				tx_id = txe->next_id;
>  				txe = txn;
> +				txe->mbuf = NULL;

Does this code correctly track mbuf ownership? In the TSO large segment
path, txe->mbuf is set to NULL here but the original txe (before
incrementing) still holds the m_seg pointer. Is it correct that cleanup
will find this mbuf only in the first descriptor of a multi-descriptor
segment?

>  				txd = &ci_tx_ring[tx_id];
>  				txn = &sw_ring[txe->next_id];
>  			}
>  
>  			/* fill the last descriptor with End of Packet (EOP) bit */
> -			if (m_seg->next == NULL)
> +			if (next_seg == NULL)
>  				td_cmd |= CI_TX_DESC_CMD_EOP;


More information about the test-report mailing list