[dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK

Olivier Matz olivier.matz at 6wind.com
Mon Mar 19 18:30:53 CET 2018


Hi Anatoly,

On Sat, Mar 03, 2018 at 01:45:48PM +0000, Anatoly Burakov wrote:
> This patchset introduces dynamic memory allocation for DPDK (aka memory
> hotplug). Based upon RFC submitted in December [1].
> 
> Dependencies (to be applied in specified order):
> - IPC bugfixes patchset [2]
> - IPC improvements patchset [3]
> - IPC asynchronous request API patch [4]
> - Function to return number of sockets [5]
> 
> Deprecation notices relevant to this patchset:
> - General outline of memory hotplug changes [6]
> - EAL NUMA node count changes [7]
> 
> The vast majority of changes are in the EAL and malloc, the external API
> disruption is minimal: a new set of API's are added for contiguous memory
> allocation for rte_memzone, and a few API additions in rte_memory due to
> switch to memseg_lists as opposed to memsegs. Every other API change is
> internal to EAL, and all of the memory allocation/freeing is handled
> through rte_malloc, with no externally visible API changes.
> 
> Quick outline of all changes done as part of this patchset:
> 
>  * Malloc heap adjusted to handle holes in address space
>  * Single memseg list replaced by multiple memseg lists
>  * VA space for hugepages is preallocated in advance
>  * Added alloc/free for pages happening as needed on rte_malloc/rte_free
>  * Added contiguous memory allocation API's for rte_memzone
>  * Integrated Pawel Wodkowski's patch for registering/unregistering memory
>    with VFIO [8]
>  * Callbacks for registering memory allocations
>  * Multiprocess support done via DPDK IPC introduced in 18.02
> 
> The biggest difference is a "memseg" now represents a single page (as opposed to
> being a big contiguous block of pages). As a consequence, both memzones and
> malloc elements are no longer guaranteed to be physically contiguous, unless
> the user asks for it at reserve time. To preserve whatever functionality that
> was dependent on previous behavior, a legacy memory option is also provided,
> however it is expected (or perhaps vainly hoped) to be temporary solution.
> 
> Why multiple memseg lists instead of one? Since memseg is a single page now,
> the list of memsegs will get quite big, and we need to locate pages somehow
> when we allocate and free them. We could of course just walk the list and
> allocate one contiguous chunk of VA space for memsegs, but this
> implementation uses separate lists instead in order to speed up many
> operations with memseg lists.
> 
> For v1, the following limitations are present:
> - FreeBSD does not even compile, let alone run
> - No 32-bit support
> - There are some minor quality-of-life improvements planned that aren't
>   ready yet and will be part of v2
> - VFIO support is only smoke-tested (but is expected to work), VFIO support
>   with secondary processes is not tested; work is ongoing to validate VFIO
>   for all use cases
> - Dynamic mapping/unmapping memory with VFIO is not supported in sPAPR
>   IOMMU mode - help from sPAPR maintainers requested
> 
> Nevertheless, this patchset should be testable under 64-bit Linux, and
> should work for all use cases bar those mentioned above.
> 
> [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/
> [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/
> [3] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/
> [4] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/
> [5] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/
> [6] http://dpdk.org/dev/patchwork/patch/34002/
> [7] http://dpdk.org/dev/patchwork/patch/33853/
> [8] http://dpdk.org/dev/patchwork/patch/24484/

I did a quick pass on your patches (unfortunately, I don't have
the time to really dive in it).

I have few questions/comments:

- This is really a big patchset. Thank you for working on this topic.
  I'll try to test our application with it as soon as possible.

- I see from patch 17 that it is possible that rte_malloc() expands
  the heap by requesting more memory to the OS? Did I understand well?
  Today, a good property of rte_malloc() compared to malloc() is that
  it won't interrupt the process (the worst case is a spinlock). This
  is appreciable on a dataplane core. Will it change?

- It's not a big issue, but I have the feeling that the "const" statement
  is often forgotten in the patchset. I think it is helpful for both
  optimization, documentation and to detect bugs that modifies/free
  something that should not.

I'm sending some other dummy comments as replies to patches.

Thanks,
Olivier


More information about the dev mailing list