[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
anatoly.burakov at intel.com
Sat Jan 13 15:13:53 CET 2018
On 19-Dec-17 11:14 AM, Anatoly Burakov wrote:
> This patchset introduces a prototype implementation of dynamic memory allocation
> for DPDK. It is intended to start a conversation and build consensus on the best
> way to implement this functionality. The patchset works well enough to pass all
> unit tests, and to work with traffic forwarding, provided the device drivers are
> adjusted to ensure contiguous memory allocation where it matters.
> The vast majority of changes are in the EAL and malloc, the external API
> disruption is minimal: a new set of API's are added for contiguous memory
> allocation (for rte_malloc and rte_memzone), and a few API additions in
> rte_memory. Every other API change is internal to EAL, and all of the memory
> allocation/freeing is handled through rte_malloc, with no externally visible
> API changes, aside from a call to get physmem layout, which no longer makes
> sense given that there are multiple memseg lists.
> Quick outline of all changes done as part of this patchset:
> * Malloc heap adjusted to handle holes in address space
> * Single memseg list replaced by multiple expandable memseg lists
> * VA space for hugepages is preallocated in advance
> * Added dynamic alloc/free for pages, happening as needed on malloc/free
> * Added contiguous memory allocation API's for rte_malloc and rte_memzone
> * Integrated Pawel Wodkowski's patch  for registering/unregistering memory
> with VFIO
> The biggest difference is a "memseg" now represents a single page (as opposed to
> being a big contiguous block of pages). As a consequence, both memzones and
> malloc elements are no longer guaranteed to be physically contiguous, unless
> the user asks for it. To preserve whatever functionality that was dependent
> on previous behavior, a legacy memory option is also provided, however it is
> expected to be temporary solution. The drivers weren't adjusted in this patchset,
> and it is expected that whoever shall test the drivers with this patchset will
> modify their relevant drivers to support the new set of API's. Basic testing
> with forwarding traffic was performed, both with UIO and VFIO, and no performance
> degradation was observed.
> Why multiple memseg lists instead of one? It makes things easier on a number of
> fronts. Since memseg is a single page now, the list will get quite big, and we
> need to locate pages somehow when we allocate and free them. We could of course
> just walk the list and allocate one contiguous chunk of VA space for memsegs,
> but i chose to use separate lists instead, to speed up many operations with the
> It would be great to see the following discussions within the community regarding
> both current implementation and future work:
> * Any suggestions to improve current implementation. The whole system with
> multiple memseg lists is kind of unweildy, so maybe there are better ways to
> do the same thing. Maybe use a single list after all? We're not expecting
> malloc/free on hot path, so maybe it doesn't matter that we have to walk
> the list of potentially thousands of pages?
> * Pluggable memory allocators. Right now, allocators are hardcoded, but down
> the line it would be great to have custom allocators (e.g. for externally
> allocated memory). I've tried to keep the memalloc API minimal and generic
> enough to be able to easily change it down the line, but suggestions are
> welcome. Memory drivers, with ops for alloc/free etc.?
> * Memory tagging. This is related to previous item. Right now, we can only ask
> malloc to allocate memory by page size, but one could potentially have
> different memory regions backed by pages of similar sizes (for example,
> locked 1G pages, to completely avoid TLB misses, alongside regular 1G pages),
> and it would be good to have that kind of mechanism to distinguish between
> different memory types available to a DPDK application. One could, for example,
> tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
> * Secondary process implementation, in particular when it comes to allocating/
> freeing new memory. Current plan is to make use of RPC mechanism proposed by
> Jianfeng  to communicate between primary and secondary processes, however
> other suggestions are welcome.
> * Support for non-hugepage memory. This work is planned down the line. Aside
> from obvious concerns about physical addresses, 4K pages are small and will
> eat up enormous amounts of memseg list space, so my proposal would be to
> allocate 4K pages in bigger blocks (say, 2MB).
> * 32-bit support. Current implementation lacks it, and i don't see a trivial
> way to make it work if we are to preallocate huge chunks of VA space in
> advance. We could limit it to 1G per page size, but even that, on multiple
> sockets, won't work that well, and we can't know in advance what kind of
> memory user will try to allocate. Drop it? Leave it in legacy mode only?
> * Preallocation. Right now, malloc will free any and all memory that it can,
> which could lead to a (perhaps counterintuitive) situation where a user
> calls DPDK with --socket-mem=1024,1024, does a single "rte_free" and loses
> all of the preallocated memory in the process. Would preallocating memory
> *and keeping it no matter what* be a valid use case? E.g. if DPDK was run
> without any memory requirements specified, grow and shrink as needed, but
> DPDK was asked to preallocate memory, we can grow but we can't shrink
> past the preallocated amount?
> Any other feedback about things i didn't think of or missed is greatly
>  http://dpdk.org/dev/patchwork/patch/24484/
>  http://dpdk.org/dev/patchwork/patch/31838/
Could this proposal be discussed at the next tech board meeting?
More information about the dev