[dpdk-dev] [PATCH v2 28/41] eal: add support for multiprocess memory hotplug

Tan, Jianfeng jianfeng.tan at intel.com
Fri Mar 23 16:44:43 CET 2018



On 3/8/2018 12:56 AM, Anatoly Burakov wrote:
> This enables multiprocess synchronization for memory hotplug
> requests at runtime (as opposed to initialization).
>
> Basic workflow is the following. Primary process always does initial
> mapping and unmapping, and secondary processes always follow primary
> page map. Only one allocation request can be active at any one time.
>
> When primary allocates memory, it ensures that all other processes
> have allocated the same set of hugepages successfully, otherwise
> any allocations made are being rolled back, and heap is freed back.
> Heap is locked throughout the process, so no race conditions can
> happen.
>
> When primary frees memory, it frees the heap, deallocates affected
> pages, and notifies other processes of deallocations. Since heap is
> freed from that memory chunk, the area basically becomes invisible
> to other processes even if they happen to fail to unmap that
> specific set of pages, so it's completely safe to ignore results of
> sync requests.
>
> When secondary allocates memory, it does not do so by itself.
> Instead, it sends a request to primary process to try and allocate
> pages of specified size and on specified socket, such that a
> specified heap allocation request could complete. Primary process
> then sends all secondaries (including the requestor) a separate
> notification of allocated pages, and expects all secondary
> processes to report success before considering pages as "allocated".
>
> Only after primary process ensures that all memory has been
> successfully allocated in all secondary process, it will respond
> positively to the initial request, and let secondary proceed with
> the allocation. Since the heap now has memory that can satisfy
> allocation request, and it was locked all this time (so no other
> allocations could take place), secondary process will be able to
> allocate memory from the heap.
>
> When secondary frees memory, it hides pages to be deallocated from
> the heap. Then, it sends a deallocation request to primary process,
> so that it deallocates pages itself, and then sends a separate sync
> request to all other processes (including the requestor) to unmap
> the same pages. This way, even if secondary fails to notify other
> processes of this deallocation, that memory will become invisible
> to other processes, and will not be allocated from again.
>
> So, to summarize: address space will only become part of the heap
> if primary process can ensure that all other processes have
> allocated this memory successfully. If anything goes wrong, the
> worst thing that could happen is that a page will "leak" and will
> not be available to neither DPDK nor the system, as some process
> will still hold onto it. It's not an actual leak, as we can account
> for the page - it's just that none of the processes will be able
> to use this page for anything useful, until it gets allocated from
> by the primary.
>
> Due to underlying DPDK IPC implementation being single-threaded,
> some asynchronous magic had to be done, as we need to complete
> several requests before we can definitively allow secondary process
> to use allocated memory (namely, it has to be present in all other
> secondary processes before it can be used). Additionally, only
> one allocation request is allowed to be submitted at once.
>
> Memory allocation requests are only allowed when there are no
> secondary processes currently initializing. To enforce that,
> a shared rwlock is used, that is set to read lock on init (so that
> several secondaries could initialize concurrently), and write lock
> on making allocation requests (so that either secondary init will
> have to wait, or allocation request will have to wait until all
> processes have initialized).
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
>
> Notes:
>      v2: - fixed deadlocking on init problem
>          - reverted rte_panic changes (fixed by changes in IPC instead)
>      
>      This problem is evidently complex to solve without multithreaded
>      IPC implementation. An alternative approach would be to process
>      each individual message in its own thread (or at least spawn a
>      thread per incoming request) - that way, we can send requests
>      while responding to another request, and this problem becomes
>      trivial to solve (and in fact it was solved that way initially,
>      before my aversion to certain other programming languages kicked
>      in).
>      
>      Is the added complexity worth saving a couple of thread spin-ups
>      here and there?
>
>   lib/librte_eal/bsdapp/eal/Makefile                |   1 +
>   lib/librte_eal/common/eal_common_memory.c         |  16 +-
>   lib/librte_eal/common/include/rte_eal_memconfig.h |   3 +
>   lib/librte_eal/common/malloc_heap.c               | 255 ++++++--
>   lib/librte_eal/common/malloc_mp.c                 | 723 ++++++++++++++++++++++
>   lib/librte_eal/common/malloc_mp.h                 |  86 +++
>   lib/librte_eal/common/meson.build                 |   1 +
>   lib/librte_eal/linuxapp/eal/Makefile              |   1 +
>   8 files changed, 1040 insertions(+), 46 deletions(-)
>   create mode 100644 lib/librte_eal/common/malloc_mp.c
>   create mode 100644 lib/librte_eal/common/malloc_mp.h
...
> +/* callback for asynchronous sync requests for primary. this will either do a
> + * sendmsg with results, or trigger rollback request.
> + */
> +static int
> +handle_sync_response(const struct rte_mp_msg *request,

Rename to handle_async_response()?


More information about the dev mailing list