[dpdk-dev] [PATCH v2 28/41] eal: add support for multiprocess memory hotplug
Tan, Jianfeng
jianfeng.tan at intel.com
Fri Mar 23 16:44:43 CET 2018
On 3/8/2018 12:56 AM, Anatoly Burakov wrote:
> This enables multiprocess synchronization for memory hotplug
> requests at runtime (as opposed to initialization).
>
> Basic workflow is the following. Primary process always does initial
> mapping and unmapping, and secondary processes always follow primary
> page map. Only one allocation request can be active at any one time.
>
> When primary allocates memory, it ensures that all other processes
> have allocated the same set of hugepages successfully, otherwise
> any allocations made are being rolled back, and heap is freed back.
> Heap is locked throughout the process, so no race conditions can
> happen.
>
> When primary frees memory, it frees the heap, deallocates affected
> pages, and notifies other processes of deallocations. Since heap is
> freed from that memory chunk, the area basically becomes invisible
> to other processes even if they happen to fail to unmap that
> specific set of pages, so it's completely safe to ignore results of
> sync requests.
>
> When secondary allocates memory, it does not do so by itself.
> Instead, it sends a request to primary process to try and allocate
> pages of specified size and on specified socket, such that a
> specified heap allocation request could complete. Primary process
> then sends all secondaries (including the requestor) a separate
> notification of allocated pages, and expects all secondary
> processes to report success before considering pages as "allocated".
>
> Only after primary process ensures that all memory has been
> successfully allocated in all secondary process, it will respond
> positively to the initial request, and let secondary proceed with
> the allocation. Since the heap now has memory that can satisfy
> allocation request, and it was locked all this time (so no other
> allocations could take place), secondary process will be able to
> allocate memory from the heap.
>
> When secondary frees memory, it hides pages to be deallocated from
> the heap. Then, it sends a deallocation request to primary process,
> so that it deallocates pages itself, and then sends a separate sync
> request to all other processes (including the requestor) to unmap
> the same pages. This way, even if secondary fails to notify other
> processes of this deallocation, that memory will become invisible
> to other processes, and will not be allocated from again.
>
> So, to summarize: address space will only become part of the heap
> if primary process can ensure that all other processes have
> allocated this memory successfully. If anything goes wrong, the
> worst thing that could happen is that a page will "leak" and will
> not be available to neither DPDK nor the system, as some process
> will still hold onto it. It's not an actual leak, as we can account
> for the page - it's just that none of the processes will be able
> to use this page for anything useful, until it gets allocated from
> by the primary.
>
> Due to underlying DPDK IPC implementation being single-threaded,
> some asynchronous magic had to be done, as we need to complete
> several requests before we can definitively allow secondary process
> to use allocated memory (namely, it has to be present in all other
> secondary processes before it can be used). Additionally, only
> one allocation request is allowed to be submitted at once.
>
> Memory allocation requests are only allowed when there are no
> secondary processes currently initializing. To enforce that,
> a shared rwlock is used, that is set to read lock on init (so that
> several secondaries could initialize concurrently), and write lock
> on making allocation requests (so that either secondary init will
> have to wait, or allocation request will have to wait until all
> processes have initialized).
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
>
> Notes:
> v2: - fixed deadlocking on init problem
> - reverted rte_panic changes (fixed by changes in IPC instead)
>
> This problem is evidently complex to solve without multithreaded
> IPC implementation. An alternative approach would be to process
> each individual message in its own thread (or at least spawn a
> thread per incoming request) - that way, we can send requests
> while responding to another request, and this problem becomes
> trivial to solve (and in fact it was solved that way initially,
> before my aversion to certain other programming languages kicked
> in).
>
> Is the added complexity worth saving a couple of thread spin-ups
> here and there?
>
> lib/librte_eal/bsdapp/eal/Makefile | 1 +
> lib/librte_eal/common/eal_common_memory.c | 16 +-
> lib/librte_eal/common/include/rte_eal_memconfig.h | 3 +
> lib/librte_eal/common/malloc_heap.c | 255 ++++++--
> lib/librte_eal/common/malloc_mp.c | 723 ++++++++++++++++++++++
> lib/librte_eal/common/malloc_mp.h | 86 +++
> lib/librte_eal/common/meson.build | 1 +
> lib/librte_eal/linuxapp/eal/Makefile | 1 +
> 8 files changed, 1040 insertions(+), 46 deletions(-)
> create mode 100644 lib/librte_eal/common/malloc_mp.c
> create mode 100644 lib/librte_eal/common/malloc_mp.h
...
> +/* callback for asynchronous sync requests for primary. this will either do a
> + * sendmsg with results, or trigger rollback request.
> + */
> +static int
> +handle_sync_response(const struct rte_mp_msg *request,
Rename to handle_async_response()?
More information about the dev
mailing list