[PATCH v2 1/1] mempool: implement index-based per core cache

Morten Brørup mb at smartsharesystems.com
Thu Jan 20 09:21:40 CET 2022


+CC Beilei as i40e maintainer

> From: Dharmik Thakkar [mailto:dharmik.thakkar at arm.com]
> Sent: Thursday, 13 January 2022 06.37
> 
> Current mempool per core cache implementation stores pointers to mbufs
> On 64b architectures, each pointer consumes 8B
> This patch replaces it with index-based implementation,
> where in each buffer is addressed by (pool base address + index)
> It reduces the amount of memory/cache required for per core cache
> 
> L3Fwd performance testing reveals minor improvements in the cache
> performance (L1 and L2 misses reduced by 0.60%)
> with no change in throughput
> 
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> Signed-off-by: Dharmik Thakkar <dharmik.thakkar at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> ---
>  lib/mempool/rte_mempool.h             | 150 +++++++++++++++++++++++++-
>  lib/mempool/rte_mempool_ops_default.c |   7 ++
>  2 files changed, 156 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c15273c..f2403fbc97a7 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -50,6 +50,10 @@
>  #include <rte_memcpy.h>
>  #include <rte_common.h>
> 
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> +#include <rte_vect.h>
> +#endif
> +
>  #include "rte_mempool_trace_fp.h"
> 
>  #ifdef __cplusplus
> @@ -239,6 +243,9 @@ struct rte_mempool {
>  	int32_t ops_index;
> 
>  	struct rte_mempool_cache *local_cache; /**< Per-lcore local cache
> */
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> +	void *pool_base_value; /**< Base value to calculate indices */
> +#endif
> 
>  	uint32_t populated_size;         /**< Number of populated
> objects. */
>  	struct rte_mempool_objhdr_list elt_list; /**< List of objects in
> pool */
> @@ -1314,7 +1321,22 @@ rte_mempool_cache_flush(struct rte_mempool_cache
> *cache,
>  	if (cache == NULL || cache->len == 0)
>  		return;
>  	rte_mempool_trace_cache_flush(cache, mp);
> +
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> +	unsigned int i;
> +	unsigned int cache_len = cache->len;
> +	void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3];
> +	void *base_value = mp->pool_base_value;
> +	uint32_t *cache_objs = (uint32_t *) cache->objs;

Hi Dharmik and Honnappa,

The essence of this patch is based on recasting the type of the objs field in the rte_mempool_cache structure from an array of pointers to an array of uint32_t.

However, this effectively breaks the ABI, because the rte_mempool_cache structure is public and part of the API.

Some drivers [1] even bypass the mempool API and access the rte_mempool_cache structure directly, assuming that the objs array in the cache is an array of pointers. So you cannot recast the fields in the rte_mempool_cache structure the way this patch requires.

Although I do consider bypassing an API's accessor functions "spaghetti code", this driver's behavior is formally acceptable as long as the rte_mempool_cache structure is not marked as internal.

I really liked your idea of using indexes instead of pointers, so I'm very sorry to shoot it down. :-(

[1]: E.g. the Intel i40e PMD, http://code.dpdk.org/dpdk/latest/source/drivers/net/i40e/i40e_rxtx_vec_avx512.c#L25

-Morten



More information about the dev mailing list