[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

Anatoly Burakov anatoly.burakov at intel.com
Tue Dec 19 12:14:27 CET 2017

This patchset introduces a prototype implementation of dynamic memory allocation
for DPDK. It is intended to start a conversation and build consensus on the best
way to implement this functionality. The patchset works well enough to pass all
unit tests, and to work with traffic forwarding, provided the device drivers are
adjusted to ensure contiguous memory allocation where it matters.

The vast majority of changes are in the EAL and malloc, the external API
disruption is minimal: a new set of API's are added for contiguous memory
allocation (for rte_malloc and rte_memzone), and a few API additions in
rte_memory. Every other API change is internal to EAL, and all of the memory
allocation/freeing is handled through rte_malloc, with no externally visible
API changes, aside from a call to get physmem layout, which no longer makes
sense given that there are multiple memseg lists.

Quick outline of all changes done as part of this patchset:

 * Malloc heap adjusted to handle holes in address space
 * Single memseg list replaced by multiple expandable memseg lists
 * VA space for hugepages is preallocated in advance
 * Added dynamic alloc/free for pages, happening as needed on malloc/free
 * Added contiguous memory allocation API's for rte_malloc and rte_memzone
 * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
   with VFIO

The biggest difference is a "memseg" now represents a single page (as opposed to
being a big contiguous block of pages). As a consequence, both memzones and
malloc elements are no longer guaranteed to be physically contiguous, unless
the user asks for it. To preserve whatever functionality that was dependent
on previous behavior, a legacy memory option is also provided, however it is
expected to be temporary solution. The drivers weren't adjusted in this patchset,
and it is expected that whoever shall test the drivers with this patchset will
modify their relevant drivers to support the new set of API's. Basic testing
with forwarding traffic was performed, both with UIO and VFIO, and no performance
degradation was observed.

Why multiple memseg lists instead of one? It makes things easier on a number of
fronts. Since memseg is a single page now, the list will get quite big, and we
need to locate pages somehow when we allocate and free them. We could of course
just walk the list and allocate one contiguous chunk of VA space for memsegs,
but i chose to use separate lists instead, to speed up many operations with the

It would be great to see the following discussions within the community regarding
both current implementation and future work:

 * Any suggestions to improve current implementation. The whole system with
   multiple memseg lists is kind of unweildy, so maybe there are better ways to
   do the same thing. Maybe use a single list after all? We're not expecting
   malloc/free on hot path, so maybe it doesn't matter that we have to walk
   the list of potentially thousands of pages?
 * Pluggable memory allocators. Right now, allocators are hardcoded, but down
   the line it would be great to have custom allocators (e.g. for externally
   allocated memory). I've tried to keep the memalloc API minimal and generic
   enough to be able to easily change it down the line, but suggestions are
   welcome. Memory drivers, with ops for alloc/free etc.?
 * Memory tagging. This is related to previous item. Right now, we can only ask
   malloc to allocate memory by page size, but one could potentially have
   different memory regions backed by pages of similar sizes (for example,
   locked 1G pages, to completely avoid TLB misses, alongside regular 1G pages),
   and it would be good to have that kind of mechanism to distinguish between
   different memory types available to a DPDK application. One could, for example,
   tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
 * Secondary process implementation, in particular when it comes to allocating/
   freeing new memory. Current plan is to make use of RPC mechanism proposed by
   Jianfeng [2] to communicate between primary and secondary processes, however
   other suggestions are welcome.
 * Support for non-hugepage memory. This work is planned down the line. Aside
   from obvious concerns about physical addresses, 4K pages are small and will
   eat up enormous amounts of memseg list space, so my proposal would be to
   allocate 4K pages in bigger blocks (say, 2MB).
 * 32-bit support. Current implementation lacks it, and i don't see a trivial
   way to make it work if we are to preallocate huge chunks of VA space in
   advance. We could limit it to 1G per page size, but even that, on multiple
   sockets, won't work that well, and we can't know in advance what kind of
   memory user will try to allocate. Drop it? Leave it in legacy mode only?
 * Preallocation. Right now, malloc will free any and all memory that it can,
   which could lead to a (perhaps counterintuitive) situation where a user
   calls DPDK with --socket-mem=1024,1024, does a single "rte_free" and loses
   all of the preallocated memory in the process. Would preallocating memory
   *and keeping it no matter what* be a valid use case? E.g. if DPDK was run
   without any memory requirements specified, grow and shrink as needed, but
   DPDK was asked to preallocate memory, we can grow but we can't shrink
   past the preallocated amount?

Any other feedback about things i didn't think of or missed is greatly

[1] http://dpdk.org/dev/patchwork/patch/24484/
[2] http://dpdk.org/dev/patchwork/patch/31838/

Anatoly Burakov (23):
  eal: move get_virtual_area out of linuxapp eal_memory.c
  eal: add function to report number of detected sockets
  eal: add rte_fbarray
  eal: move all locking to heap
  eal: protect malloc heap stats with a lock
  eal: make malloc a doubly-linked list
  eal: make malloc_elem_join_adjacent_free public
  eal: add "single file segments" command-line option
  eal: add "legacy memory" option
  eal: read hugepage counts from node-specific sysfs path
  eal: replace memseg with memseg lists
  eal: add support for dynamic memory allocation
  eal: make use of dynamic memory allocation for init
  eal: add support for dynamic unmapping of pages
  eal: add API to check if memory is physically contiguous
  eal: enable dynamic memory allocation/free on malloc/free
  eal: add backend support for contiguous memory allocation
  eal: add rte_malloc support for allocating contiguous memory
  eal: enable reserving physically contiguous memzones
  eal: make memzones use rte_fbarray
  mempool: add support for the new memory allocation methods
  vfio: allow to map other memory regions
  eal: map/unmap memory with VFIO when alloc/free pages

 config/common_base                                |   5 +-
 drivers/bus/pci/linux/pci.c                       |  29 +-
 drivers/net/ena/ena_ethdev.c                      |  10 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c     | 106 ++--
 lib/librte_eal/common/Makefile                    |   2 +-
 lib/librte_eal/common/eal_common_fbarray.c        | 585 ++++++++++++++++++++++
 lib/librte_eal/common/eal_common_lcore.c          |  11 +
 lib/librte_eal/common/eal_common_memalloc.c       |  79 +++
 lib/librte_eal/common/eal_common_memory.c         | 315 +++++++++++-
 lib/librte_eal/common/eal_common_memzone.c        | 250 ++++++---
 lib/librte_eal/common/eal_common_options.c        |   8 +
 lib/librte_eal/common/eal_filesystem.h            |  13 +
 lib/librte_eal/common/eal_hugepages.h             |   1 +
 lib/librte_eal/common/eal_internal_cfg.h          |   6 +
 lib/librte_eal/common/eal_memalloc.h              |  55 ++
 lib/librte_eal/common/eal_options.h               |   4 +
 lib/librte_eal/common/eal_private.h               |  29 ++
 lib/librte_eal/common/include/rte_eal.h           |   1 +
 lib/librte_eal/common/include/rte_eal_memconfig.h |  26 +-
 lib/librte_eal/common/include/rte_fbarray.h       |  98 ++++
 lib/librte_eal/common/include/rte_lcore.h         |   8 +
 lib/librte_eal/common/include/rte_malloc.h        | 181 +++++++
 lib/librte_eal/common/include/rte_malloc_heap.h   |   6 +
 lib/librte_eal/common/include/rte_memory.h        |  16 +
 lib/librte_eal/common/include/rte_memzone.h       | 158 ++++++
 lib/librte_eal/common/malloc_elem.c               | 411 ++++++++++++---
 lib/librte_eal/common/malloc_elem.h               |  30 +-
 lib/librte_eal/common/malloc_heap.c               | 433 ++++++++++++++--
 lib/librte_eal/common/malloc_heap.h               |  14 +-
 lib/librte_eal/common/rte_malloc.c                | 139 +++--
 lib/librte_eal/linuxapp/eal/Makefile              |   4 +
 lib/librte_eal/linuxapp/eal/eal.c                 |  23 +-
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c   |  73 ++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 556 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 452 ++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_vfio.c            | 280 ++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.h            |  11 +
 lib/librte_mempool/rte_mempool.c                  |  84 +++-
 test/test/test_malloc.c                           |  29 +-
 test/test/test_memory.c                           |  44 +-
 test/test/test_memzone.c                          |  17 +-
 41 files changed, 3999 insertions(+), 603 deletions(-)
 create mode 100755 lib/librte_eal/common/eal_common_fbarray.c
 create mode 100755 lib/librte_eal/common/eal_common_memalloc.c
 create mode 100755 lib/librte_eal/common/eal_memalloc.h
 create mode 100755 lib/librte_eal/common/include/rte_fbarray.h
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_memalloc.c


More information about the dev mailing list