[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

Stephen Hemminger stephen at networkplumber.org
Wed Apr 25 18:12:34 CEST 2018


On Wed, 25 Apr 2018 17:02:48 +0100
"Burakov, Anatoly" <anatoly.burakov at intel.com> wrote:

> On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote:
> > On 14-Feb-18 8:04 AM, Thomas Monjalon wrote:  
> >> Hi Anatoly,
> >>
> >> 19/12/2017 12:14, Anatoly Burakov:  
> >>>   * Memory tagging. This is related to previous item. Right now, we 
> >>> can only ask
> >>>     malloc to allocate memory by page size, but one could potentially 
> >>> have
> >>>     different memory regions backed by pages of similar sizes (for 
> >>> example,
> >>>     locked 1G pages, to completely avoid TLB misses, alongside 
> >>> regular 1G pages),
> >>>     and it would be good to have that kind of mechanism to 
> >>> distinguish between
> >>>     different memory types available to a DPDK application. One 
> >>> could, for example,
> >>>     tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.  
> >>
> >> How do you imagine memory tagging?
> >> Should it be a parameter when requesting some memory from rte_malloc
> >> or rte_mempool?  
> > 
> > We can't make it a parameter for mempool without making it a parameter 
> > for rte_malloc, as every memory allocation in DPDK works through 
> > rte_malloc. So at the very least, rte_malloc will have it. And as long 
> > as rte_malloc has it, there's no reason why memzones and mempools 
> > couldn't - not much code to add.
> >   
> >> Could it be a bit-field allowing to combine some properties?
> >> Does it make sense to have "DMA" as one of the purpose?  
> > 
> > Something like a bitfield would be my preference, yes. That way we could 
> > classify memory in certain ways and allocate based on that. Which 
> > "certain ways" these are, i'm not sure. For example, in addition to 
> > tagging memory as "DMA-capable" (which i think is a given), one might 
> > tag certain memory as "non-default", as in, never allocate from this 
> > chunk of memory unless explicitly asked to do so - this could be useful 
> > for types of memory that are a precious resource.
> > 
> > Then again, it is likely that we won't have many types of memory in 
> > DPDK, and any other type would be implementation-specific, so maybe just 
> > stringly-typing it is OK (maybe we can finally make use of "type" 
> > parameter in rte_malloc!).
> >   
> >>
> >> How to transparently allocate the best memory for the NIC?
> >> You take care of the NUMA socket property, but there can be more
> >> requirements, like getting memory from the NIC itself.  
> > 
> > I would think that we can't make it generic enough to cover all cases, 
> > so it's best to expose some API's and let PMD's handle this themselves.
> >   
> >>
> >> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP, 
> >> Solarflare)
> >> in order to trigger a discussion about the ideal requirements.
> >>  
> >   
> 
> Hi all,
> 
> I would like to restart this discussion, again :) I would like to hear 
> some feedback on my thoughts below.
> 
> I've had some more thinking about it, and while i have lots of use-cases 
> in mind, i suspect covering them all while keeping a sane API is 
> unrealistic.
> 
> So, first things first.
> 
> Main issue we have is the 1:1 correspondence of malloc heap, and socket 
> ID. This has led to various attempts to hijack socket id's to do 
> something else - i've seen this approach a few times before, most 
> recently in a patch by Srinath/Broadcom [1]. We need to break this 
> dependency somehow, and have a unique heap identifier.
> 
> Also, since memory allocators are expected to behave roughly similar to 
> drivers (e.g. have a driver API and provide hooks for init/alloc/free 
> functions, etc.), a request to allocate memory may not just go to the 
> heap itself (which is handled internally by rte_malloc), but also go to 
> its respective allocator. This is roughly similar to what is happening 
> currently, except that which allocator functions to call will then 
> depend on which driver allocated that heap.
> 
> So, we arrive at a dependency - heap => allocator. Each heap must know 
> to which allocator it belongs - so, we also need some kind of way to 
> identify not just the heap, but the allocator as well.
> 
> In the above quotes from previous mails i suggested categorizing memory 
> by "types", but now that i think of it, the API would've been too 
> complex, as we would've ideally had to cover use cases such as "allocate 
> memory of this type, no matter from which allocator it comes from", 
> "allocate memory from this particular heap", "allocate memory from this 
> particular allocator"... It gets complicated pretty fast.
> 
> What i propose instead, is this. In 99% of time, user wants our hugepage 
> allocator. So, by default, all allocations will come through that. In 
> the event that user needs memory from a specific heap, we need to 
> provide a new set of API's to request memory from a specific heap.
> 
> Do we expect situations where user might *not* want default allocator, 
> but also *not* know which exact heap he wants? If the answer is no 
> (which i'm counting on :) ), then allocating from a specific malloc 
> driver becomes as simple as something like this:
> 
> mem = rte_malloc_from_heap("my_very_special_heap");
> 
> (stringly-typed heap ID is just an example)
> 
> So, old API's remain intact, and are always passed through to a default 
> allocator, while new API's will grant access to other allocators.
> 
> Heap ID alone, however, may not provide enough flexibility. For example, 
> if a malloc driver allocates a specific kind of memory that is 
> NUMA-aware, it would perhaps be awkward to call different heap ID's when 
> the memory being allocated is arguably the same, just subdivided into 
> several blocks. Moreover, figuring out situations like this would likely 
> require some cooperation from the allocator itself (possibly some 
> allocator-specific API's), but should we add malloc heap arguments, 
> those would have to be generic. I'm not sure if we want to go that far, 
> though.
> 
> Does that sound reasonable?
> 
> Another tangentially related issue raised by Olivier [1] is of 
> allocating memory in blocks, rather than using rte_malloc. Current 
> implementation has rte_malloc storing its metadata right in the memory - 
> this leads to unnecessary memory fragmentation in certain cases, such as 
> allocating memory page-by-page, and in general polluting memory we might 
> not want to pollute with malloc metadata.
> 
> To fix this, memory allocator would have to store malloc data 
> externally, which comes with a few caveats (reverse mapping of pointers 
> to malloc elements, storing, looking up and accounting for said 
> elements, etc.). It's not currently planned to work on it, but it's 
> certainly something to think about :)
> 
> [1] http://dpdk.org/dev/patchwork/patch/36596/
> [2] http://dpdk.org/ml/archives/dev/2018-March/093212.html

Maybe the existing rte_malloc which tries to always work like malloc is not
the best API for applications? I always thought the Samba talloc API was less
error prone since it supports reference counting and hierarchal allocation.


More information about the dev mailing list