|SUCCESS| [v5] mempool: improve cache behaviour and performance

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Sun Apr 19 15:36:54 CEST 2026
Previous message (by thread): |SUCCESS| [v5] mempool: improve cache behaviour and performance
Next message (by thread): |SUCCESS| pw163184 [PATCH] common/mlx5: add ConnectX-10 SmartNIC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/163183

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-04-19

Reviewing patch: mempool: improve cache behaviour and performance

commit 20260419095526.39526-1-mb at smartsharesystems.com
Author: Morten Brorup <mb at smartsharesystems.com>

mempool: improve cache behaviour and performance

This patch refactors the mempool cache to eliminate some unexpected
behaviour and reduce the mempool cache miss rate.

```diff
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 35c9b4e06c..40760fffbb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst

[ ... ]

> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> index 9af275cd9d..dd2263b8d7 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> @@ -148,15 +148,31 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq)
>  	/* Can this be satisfied from the cache? */
>  	if (cache->len < IDPF_RXQ_REARM_THRESH) {
>  		/* No. Backfill the cache first, and then fill from it */
> -		uint32_t req = IDPF_RXQ_REARM_THRESH + (cache->size -
> -							cache->len);
>  
> -		/* How many do we require i.e. number to fill the cache + the request */
> +		/* Backfill would exceed the cache bounce buffer limit? */
> +		__rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
> +		if (unlikely(cache->size / 2 < IDPF_RXQ_REARM_THRESH)) {
> +			idpf_singleq_rearm_common(rxq);
> +			return;
> +		}
> +
> +		/*
> +		 * Backfill the cache from the backend;
> +		 * move up the hot objects in the cache to the top half of the cache,
> +		 * and fetch (size / 2) objects to the bottom of the cache.
> +		 */
> +		__rte_assume(cache->len < cache->size / 2);
> +		rte_memcpy(&cache->objs[cache->size / 2], &cache->objs[0],
> +				sizeof(void *) * cache->len);

Does this code handle the case where cache->len is zero correctly? When
cache->len is zero, rte_memcpy() would copy zero bytes, which should be
safe, but is the subsequent logic correct?

> +		int ret = rte_mempool_ops_dequeue_bulk
> +				(rxq->mp, &cache->objs[0], cache->size / 2);
>  		if (ret == 0) {
> -			cache->len += req;
> +			cache->len += cache->size / 2;

Does this arithmetic overflow when cache->len is near cache->size? The
__rte_assume above states cache->len < cache->size / 2, so cache->len +
cache->size / 2 should always be less than cache->size, but can we
verify this is safe?

>  		} else {
> +			/*
> +			 * No further action is required for roll back, as the objects moved
> +			 * in the cache were actually copied, and the cache remains intact.
> +			 */

Does this comment accurately reflect the rollback logic? When
rte_mempool_ops_dequeue_bulk() fails, the code falls through to the
error handling below. However, cache->objs[cache->size / 2] onwards now
contains a copy of the hot objects. Does the subsequent code in
idpf_singleq_rearm_common() handle this altered cache state correctly?

[ ... ]

> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 2e54fc4466..432c43ab15 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1390,24 +1397,32 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
>  
> -	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> -	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> -	__rte_assume(cache->len <= cache->flushthresh);
> -	if (likely(cache->len + n <= cache->flushthresh)) {
> +	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +	__rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
> +	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +	__rte_assume(cache->len <= cache->size);
> +	if (likely(cache->len + n <= cache->size)) {
>  		/* Sufficient room in the cache for the objects. */
>  		cache_objs = &cache->objs[cache->len];
>  		cache->len += n;
> -	} else if (n <= cache->flushthresh) {
> +	} else if (n <= cache->size / 2) {
>  		/*
> -		 * The cache is big enough for the objects, but - as detected by
> -		 * the comparison above - has insufficient room for them.
> -		 * Flush the cache to make room for the objects.
> +		 * The number of objects is within the cache bounce buffer limit,
> +		 * but - as detected by the comparison above - the cache has
> +		 * insufficient room for them.
> +		 * Flush the cache to the backend to make room for the objects;
> +		 * flush (size / 2) objects from the bottom of the cache, where
> +		 * objects are less hot, and move down the remaining objects, which
> +		 * are more hot, from the upper half of the cache.
>  		 */
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> -		cache->len = n;
> +		__rte_assume(cache->len > cache->size / 2);
> +		rte_mempool_ops_enqueue_bulk(mp, &cache->objs[0], cache->size / 2);
> +		rte_memcpy(&cache->objs[0], &cache->objs[cache->size / 2],
> +				sizeof(void *) * (cache->len - cache->size / 2));

Can this rte_memcpy() overflow cache->objs[]? When cache->len is at
cache->size and we copy from cache->size / 2, we would copy cache->size
/ 2 objects to index 0, which should fit. However, does the
documentation guarantee cache->objs[] has sufficient space for
RTE_MEMPOOL_CACHE_MAX_SIZE elements after this change?

> +		cache_objs = &cache->objs[cache->len - cache->size / 2];
> +		cache->len = cache->len - cache->size / 2 + n;

Does this arithmetic guarantee cache->len stays within bounds? Given
the entry condition cache->len + n > cache->size and n <= cache->size /
2, does cache->len - cache->size / 2 + n always stay within
cache->size?

[ ... ]

> @@ -1548,13 +1563,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>  	for (index = 0; index < len; index++)
>  		*obj_table++ = *--cache_objs;
>  
> -	/* Dequeue below would overflow mem allocated for cache? */
> -	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
> +	/* Dequeue below would exceed the cache bounce buffer limit? */
> +	__rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
> +	if (unlikely(remaining > cache->size / 2))
>  		goto driver_dequeue;
>  
> -	/* Fill the cache from the backend; fetch size + remaining objects. */
> -	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
> -			cache->size + remaining);
> +	/* Fill the cache from the backend; fetch (size / 2) objects. */
> +	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs, cache->size / 2);
>  	if (unlikely(ret < 0)) {
>  		/*
>  		 * We are buffer constrained, and not able to fetch all that.
> @@ -1568,10 +1583,11 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>  
> -	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> -	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> -	cache_objs = &cache->objs[cache->size + remaining];
> -	cache->len = cache->size;
> +	__rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
> +	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
> +	__rte_assume(remaining <= cache->size / 2);
> +	cache_objs = &cache->objs[cache->size / 2];
> +	cache->len = cache->size / 2 - remaining;

Does this code handle the case where remaining equals cache->size / 2
correctly? When remaining == cache->size / 2, cache->len becomes zero,
which seems correct, but does the for loop below correctly handle this
edge case?

>  	for (index = 0; index < remaining; index++)
>  		*obj_table++ = *--cache_objs;
```
Previous message (by thread): |SUCCESS| [v5] mempool: improve cache behaviour and performance
Next message (by thread): |SUCCESS| pw163184 [PATCH] common/mlx5: add ConnectX-10 SmartNIC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the test-report mailing list