[PATCH v2] net/af_xdp: fix umem map size for zero copy

Ferruh Yigit ferruh.yigit at amd.com
Tue May 21 19:57:34 CEST 2024


On 5/11/2024 6:26 AM, Frank Du wrote:
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
> 
> Signed-off-by: Frank Du <frank.du at intel.com>
> 
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs.
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 34 ++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..7456108d6d 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>  }
>  
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t *align, size_t *len)
>  {
> -	struct rte_mempool_memhdr *memhdr;
> +	struct rte_mempool_memhdr *memhdr, *next;
>  	uintptr_t memhdr_addr, aligned_addr;
> +	size_t memhdr_len = 0;
>  
> +	/* get the mempool base addr and align */
>  	memhdr = STAILQ_FIRST(&mp->mem_list);
>  	memhdr_addr = (uintptr_t)memhdr->addr;
>  	aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>  	*align = memhdr_addr - aligned_addr;
>

I am aware this is not part of this patch, but as note, can't we use
'RTE_ALIGN_FLOOR' to calculate aligned address.


> +	memhdr_len += memhdr->len;
> +
> +	/* check if virtual contiguous memory for multiple memhdrs */
> +	next = STAILQ_NEXT(memhdr, next);
> +	while (next != NULL) {
> +		if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr->len) {
> +			AF_XDP_LOG(ERR, "memory chunks not virtual contiguous, "
> +					"next: %p, cur: %p(len: %" PRId64 " )\n",
> +					next->addr, memhdr->addr, memhdr->len);
> +			return 0;
> +		}
>

Isn't there a mempool flag that can help us figure out mempool is not
IOVA contiguous? Isn't it sufficient on its own?


> +		/* virtual contiguous */
> +		memhdr = next;
> +		memhdr_len += memhdr->len;
> +		next = STAILQ_NEXT(memhdr, next);
> +	}
>  
> +	*len = memhdr_len;
>  	return aligned_addr;
>  }
>

This function goes too much details of the mempool object, and any
change in mempool details has potential to break this code.

@Andrew, @Morten, do you think does it make sense to have
'rte_mempool_info_get()' kind of function, that provides at least
address and length of the mempool, and used here?

This helps to hide internal details and complexity of the mempool for users.


>  
> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
>  	void *base_addr = NULL;
>  	struct rte_mempool *mb_pool = rxq->mb_pool;
>  	uint64_t umem_size, align = 0;
> +	size_t len = 0;
>  
>  	if (internals->shared_umem) {
>  		if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
>  		}
>  
>  		umem->mb_pool = mb_pool;
> -		base_addr = (void *)get_base_addr(mb_pool, &align);
> -		umem_size = (uint64_t)mb_pool->populated_size *
> -				(uint64_t)usr_config.frame_size +
> -				align;
> +		base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>

Is this calculation correct if mempool is not already aligned to page size?

Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
returned aligned address is '0x000a1000', "base_addr = 0x000a1000"

Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?


> +		if (!base_addr) {
> +			AF_XDP_LOG(ERR, "Failed to parse memhdr info from pool\n");
>

Log message is not accurate, it is not parsing memhdr info failed, but
mempool was not satisfying expectation.

> +			goto err;
> +		}
> +		umem_size = (uint64_t)len + align;
>  
>  		ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
>  				&rxq->fq, &rxq->cq, &usr_config);



More information about the dev mailing list