[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Stephen Hemminger
stephen at networkplumber.org
Wed Apr 25 18:12:34 CEST 2018
On Wed, 25 Apr 2018 17:02:48 +0100
"Burakov, Anatoly" <anatoly.burakov at intel.com> wrote:
> On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote:
> > On 14-Feb-18 8:04 AM, Thomas Monjalon wrote:
> >> Hi Anatoly,
> >>
> >> 19/12/2017 12:14, Anatoly Burakov:
> >>> * Memory tagging. This is related to previous item. Right now, we
> >>> can only ask
> >>> malloc to allocate memory by page size, but one could potentially
> >>> have
> >>> different memory regions backed by pages of similar sizes (for
> >>> example,
> >>> locked 1G pages, to completely avoid TLB misses, alongside
> >>> regular 1G pages),
> >>> and it would be good to have that kind of mechanism to
> >>> distinguish between
> >>> different memory types available to a DPDK application. One
> >>> could, for example,
> >>> tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
> >>
> >> How do you imagine memory tagging?
> >> Should it be a parameter when requesting some memory from rte_malloc
> >> or rte_mempool?
> >
> > We can't make it a parameter for mempool without making it a parameter
> > for rte_malloc, as every memory allocation in DPDK works through
> > rte_malloc. So at the very least, rte_malloc will have it. And as long
> > as rte_malloc has it, there's no reason why memzones and mempools
> > couldn't - not much code to add.
> >
> >> Could it be a bit-field allowing to combine some properties?
> >> Does it make sense to have "DMA" as one of the purpose?
> >
> > Something like a bitfield would be my preference, yes. That way we could
> > classify memory in certain ways and allocate based on that. Which
> > "certain ways" these are, i'm not sure. For example, in addition to
> > tagging memory as "DMA-capable" (which i think is a given), one might
> > tag certain memory as "non-default", as in, never allocate from this
> > chunk of memory unless explicitly asked to do so - this could be useful
> > for types of memory that are a precious resource.
> >
> > Then again, it is likely that we won't have many types of memory in
> > DPDK, and any other type would be implementation-specific, so maybe just
> > stringly-typing it is OK (maybe we can finally make use of "type"
> > parameter in rte_malloc!).
> >
> >>
> >> How to transparently allocate the best memory for the NIC?
> >> You take care of the NUMA socket property, but there can be more
> >> requirements, like getting memory from the NIC itself.
> >
> > I would think that we can't make it generic enough to cover all cases,
> > so it's best to expose some API's and let PMD's handle this themselves.
> >
> >>
> >> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP,
> >> Solarflare)
> >> in order to trigger a discussion about the ideal requirements.
> >>
> >
>
> Hi all,
>
> I would like to restart this discussion, again :) I would like to hear
> some feedback on my thoughts below.
>
> I've had some more thinking about it, and while i have lots of use-cases
> in mind, i suspect covering them all while keeping a sane API is
> unrealistic.
>
> So, first things first.
>
> Main issue we have is the 1:1 correspondence of malloc heap, and socket
> ID. This has led to various attempts to hijack socket id's to do
> something else - i've seen this approach a few times before, most
> recently in a patch by Srinath/Broadcom [1]. We need to break this
> dependency somehow, and have a unique heap identifier.
>
> Also, since memory allocators are expected to behave roughly similar to
> drivers (e.g. have a driver API and provide hooks for init/alloc/free
> functions, etc.), a request to allocate memory may not just go to the
> heap itself (which is handled internally by rte_malloc), but also go to
> its respective allocator. This is roughly similar to what is happening
> currently, except that which allocator functions to call will then
> depend on which driver allocated that heap.
>
> So, we arrive at a dependency - heap => allocator. Each heap must know
> to which allocator it belongs - so, we also need some kind of way to
> identify not just the heap, but the allocator as well.
>
> In the above quotes from previous mails i suggested categorizing memory
> by "types", but now that i think of it, the API would've been too
> complex, as we would've ideally had to cover use cases such as "allocate
> memory of this type, no matter from which allocator it comes from",
> "allocate memory from this particular heap", "allocate memory from this
> particular allocator"... It gets complicated pretty fast.
>
> What i propose instead, is this. In 99% of time, user wants our hugepage
> allocator. So, by default, all allocations will come through that. In
> the event that user needs memory from a specific heap, we need to
> provide a new set of API's to request memory from a specific heap.
>
> Do we expect situations where user might *not* want default allocator,
> but also *not* know which exact heap he wants? If the answer is no
> (which i'm counting on :) ), then allocating from a specific malloc
> driver becomes as simple as something like this:
>
> mem = rte_malloc_from_heap("my_very_special_heap");
>
> (stringly-typed heap ID is just an example)
>
> So, old API's remain intact, and are always passed through to a default
> allocator, while new API's will grant access to other allocators.
>
> Heap ID alone, however, may not provide enough flexibility. For example,
> if a malloc driver allocates a specific kind of memory that is
> NUMA-aware, it would perhaps be awkward to call different heap ID's when
> the memory being allocated is arguably the same, just subdivided into
> several blocks. Moreover, figuring out situations like this would likely
> require some cooperation from the allocator itself (possibly some
> allocator-specific API's), but should we add malloc heap arguments,
> those would have to be generic. I'm not sure if we want to go that far,
> though.
>
> Does that sound reasonable?
>
> Another tangentially related issue raised by Olivier [1] is of
> allocating memory in blocks, rather than using rte_malloc. Current
> implementation has rte_malloc storing its metadata right in the memory -
> this leads to unnecessary memory fragmentation in certain cases, such as
> allocating memory page-by-page, and in general polluting memory we might
> not want to pollute with malloc metadata.
>
> To fix this, memory allocator would have to store malloc data
> externally, which comes with a few caveats (reverse mapping of pointers
> to malloc elements, storing, looking up and accounting for said
> elements, etc.). It's not currently planned to work on it, but it's
> certainly something to think about :)
>
> [1] http://dpdk.org/dev/patchwork/patch/36596/
> [2] http://dpdk.org/ml/archives/dev/2018-March/093212.html
Maybe the existing rte_malloc which tries to always work like malloc is not
the best API for applications? I always thought the Samba talloc API was less
error prone since it supports reference counting and hierarchal allocation.
More information about the dev
mailing list