[dpdk-dev] Aligned rte_mempool for storage applications

Howell, Seth seth.howell at intel.com
Tue Mar 26 19:34:15 CET 2019
Previous message: [dpdk-dev] Aligned rte_mempool for storage applications
Next message: [dpdk-dev] Aligned rte_mempool for storage applications
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Vipin,

Thanks for your quick reply. I will respond to your queries in order.
1. Yes, in at least one case we have buffers of size 4096 bytes. Some of our other buffers are much larger (>64KiB)
2. These buffers are used in the I/O path, so performance is very important. Allocating and freeing a buffer each time we use it could be pretty costly.
3. Could you describe the idea of an indirect buffer in more detail? I don't think I quite understand that concept. I know we couldn't use mbufs because we often have buffers that are larger than 64k. I think there are more reasons we don't use the mbuf structure in our use case, but am not familiar with all of them. Maybe Jim can explain those in more detail. 

Thanks,

Seth
-----Original Message-----
From: Varghese, Vipin 
Sent: Monday, March 25, 2019 7:53 PM
To: Harris, James R <james.r.harris at intel.com>; Howell, Seth <seth.howell at intel.com>; dev at dpdk.org
Subject: RE: Aligned rte_mempool for storage applications

Hi Seth,

If I may I would like to suggest and ask a query on the mempool alignment details. Please find my suggestion and query inline to the email.

Snipped
> 
>     In SPDK, we use the rte_mempool struct for many internal structure 
> collections. The per-thread cache and ease of allocation of mempools 
> are very useful features.
>     Some of the collections we store in SPDK are pools of I/O buffers. 
> Typically, these pools contain elements of at least 4096 bytes, and we 
> would like them to be aligned to 4k for performance reasons.
Query-1> is the total memory required to be 4096 only (data portion)?

> 
> [Jim] Just to clarify Seth's point - the performance reasons are 
> specifically to avoid wasteful memcopies.  The vast majority of NVMe 
> SSDs in the market today do not have full scatter/gather support - 
> rather they only support something called PRP (Physical Region Pages) 
> which require all scatter gather elements except the first to be 4KB 
> aligned.  There are other storage interfaces such as Linux AIO that also impose alignment restrictions.
> 
> -Jim
> 
> 
>     Currently, the rte_mempool API doesn't support aligned mempool 
> objects. This means that when we allocate a 4k buffer and want it 
> aligned to 4k, we actually need to allocate an 8k buffer and calculate 
> an offset into it each time we want to use it.
Query-2> why not create contiguous 4K aligned memory with rte_malloc?

>     We recently did a proof of concept using the rte_mempool_ops hook 
> where we allocated a mempool and populated it with aligned entries. 
> This allowed us to retrieve aligned addresses directly from 
> rte_mempool_get(), but didn't help with the allocation size.
>     Because the rte_mempool struct assumes that each element has a 
> header attached to it, we still need to live up to that assumption for 
> each object we create in a mempool. This means that the actual size of 
> a buffer becomes 4k + 24 bytes. In order to get to our next aligned 
> address, we need to add about 4k of padding to each element.
>     Modifying the current rte_mempool struct to allow entries without 
> headers seems impossible since it would break rte_mempool_for_obj_iter 
> and rte_mempool_from_obj. However I still think there is a lot of 
> benefit to be gained from a mempool structure that supports aligned objects without headers.
>     I am wondering if DPDK would be open to us introducing an 
> rte_mempool_aligned structure. This structure would essentially be a 
> wrapper around a regular mempool struct. However, it would not require 
> headers or trailers for each object in the pool.
Query-3> using mempool with 0 size for data portion we can either create a indirect buffer or use external mbuf to attach MBUF to 4K aligned rte_malloc areas. 

Note: we did similar to the prototype for AF_XDP_ZC_PMD (presented in BLR summit 2019). 

Advantage: no change in mempool library, mbuf library, or rte_malloc. Application works with zero change.

> 
>     This structure would only be applicable to a subset of mempools 
> with the following characteristics:
>     	1. mempools for which the following flags were set:
> MEMPOOL_F_NO_CACHE_ALIGNED, MEMPOOL_F_NO_IOVA_CONTIG , 
> MEMPOOL_F_NO_SPREAD
>     	2. mempools that do not require the use of the following 
> functions rte_mempool_from_obj (requires a pointer to the mp in the 
> header of each obj), rte_mempool_for_obj_iter.
>     	3. Any attempt to create this object when 
> RTE_LIBRTE_MEMPOOL_DEBUG was enabled would necessarily fail since we 
> can't check the header cookies.
> 
>     My thought would be that we could implement this data structure in 
> a header and it would look something like this:
> 
>     Struct rte_mempool_aligned {
>     	Struct rte_mempool mp;
>     	Size_t obj_alignment;
>     };
> 
>     The rest of the functions in the header would primarily be 
> wrappers around the original functions. Most functions 
> (rte_mempool_alloc, rte_mempool_free, rte_mempool_enqueue/dequeue, 
> rte_mempool_get_count, etc.) could be implemented directly as 
> wrappers, and others such as rte_mempool_create and the populate 
> functions would have to be re-implemented to some degree in the new 
> header. The remaining functions (check_cookies, for_obj_iter) would not be implemented in the rte_mempool_aligned.h file.
> 
>     Would the community be welcoming of a new rte_mempool_aligned 
> struct? If you don't feel like this would be the way to go, are there 
> other options in DPDK for creating a pool of pre-allocated aligned objects?
> 
>     Thank you,
> 
>     Seth Howell
> 
> 
>
Previous message: [dpdk-dev] Aligned rte_mempool for storage applications
Next message: [dpdk-dev] Aligned rte_mempool for storage applications
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list