[dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas

Burakov, Anatoly anatoly.burakov at intel.com
Fri Dec 14 12:03:14 CET 2018


On 14-Dec-18 9:55 AM, Yongseok Koh wrote:
> On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
>> The general use-case of using external memory is well covered by
>> existing external memory API's. However, certain use cases require
>> manual management of externally allocated memory areas, so this
>> memory should not be added to the heap. It should, however, be
>> added to DPDK's internal structures, so that API's like
>> ``rte_virt2memseg`` would work on such external memory segments.
>>
>> This commit adds such an API to DPDK. The new functions will allow
>> to register and unregister externally allocated memory areas, as
>> well as documentation for them.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>   .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>>   lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>>   lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>>   lib/librte_eal/rte_eal_version.map            |  2 +
>>   4 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 8b5d050c7..d7799b626 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>>   Support for Externally Allocated Memory
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> -It is possible to use externally allocated memory in DPDK, using a set of malloc
>> -heap API's. Support for externally allocated memory is implemented through
>> -overloading the socket ID - externally allocated heaps will have socket ID's
>> -that would be considered invalid under normal circumstances. Requesting an
>> -allocation to take place from a specified externally allocated memory is a
>> -matter of supplying the correct socket ID to DPDK allocator, either directly
>> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
>> -structure-specific allocation API's such as ``rte_ring_create``).
>> +It is possible to use externally allocated memory in DPDK. There are two ways in
>> +which using externally allocated memory can work: the malloc heap API's, and
>> +manual memory management.
>>   
>> -Since there is no way DPDK can verify whether memory are is available or valid,
>> -this responsibility falls on the shoulders of the user. All multiprocess
>> ++ Using heap API's for externally allocated memory
>> +
>> +Using using a set of malloc heap API's is the recommended way to use externally
>> +allocated memory in DPDK. In this way, support for externally allocated memory
>> +is implemented through overloading the socket ID - externally allocated heaps
>> +will have socket ID's that would be considered invalid under normal
>> +circumstances. Requesting an allocation to take place from a specified
>> +externally allocated memory is a matter of supplying the correct socket ID to
>> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
>> +indirectly (through data structure-specific allocation API's such as
>> +``rte_ring_create``). Using these API's also ensures that mapping of externally
>> +allocated memory for DMA is also performed on any memory segment that is added
>> +to a DPDK malloc heap.
>> +
>> +Since there is no way DPDK can verify whether memory is available or valid, this
>> +responsibility falls on the shoulders of the user. All multiprocess
>>   synchronization is also user's responsibility, as well as ensuring  that all
>>   calls to add/attach/detach/remove memory are done in the correct order. It is
>>   not required to attach to a memory area in all processes - only attach to memory
>> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>>   For more information, please refer to ``rte_malloc`` API documentation,
>>   specifically the ``rte_malloc_heap_*`` family of function calls.
>>   
>> ++ Using externally allocated memory without DPDK API's
>> +
>> +While using heap API's is the recommended method of using externally allocated
>> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
>> +is undesirable - for example, when manual memory management is performed on an
>> +externally allocated area. To support use cases where externally allocated
>> +memory will not be used as part of normal DPDK workflow, there is also another
>> +set of API's under the ``rte_extmem_*`` namespace.
>> +
>> +These API's are (as their name implies) intended to allow registering or
>> +unregistering externally allocated memory to/from DPDK's internal page table, to
>> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
>> +memory. Memory added this way will not be available for any regular DPDK
>> +allocators; DPDK will leave this memory for the user application to manage.
>> +
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Register memory within DPDK
>> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
>> +      unavailable
>> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>> +* Use the memory area in your application
>> +* If memory area is no longer needed, it can be unregistered
>> +    - If the area was mapped for DMA, unmapping must be performed before
>> +      unregistering memory
>> +
>> +Since these externally allocated memory areas will not be managed by DPDK, it is
>> +therefore up to the user application to decide how to use them and what to do
>> +with them once they're registered.
>> +
>>   Per-lcore and Shared Variables
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
>> index d47ea4938..a2e085ae8 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,7 @@
>>   #include "eal_memalloc.h"
>>   #include "eal_private.h"
>>   #include "eal_internal_cfg.h"
>> +#include "malloc_heap.h"
>>   
>>   /*
>>    * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>>   	return ret;
>>   }
>>   
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	unsigned int socket_id;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
>> +			!rte_is_power_of_2(page_sz) ||
>> +			RTE_ALIGN(len, page_sz) != len) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
> 
> Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
> rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
> it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
> have it either... Also you might want to add it to documentation that
> granularity of these registrations is a page.
> 

Hi Yongseok,

Thanks for your review.

n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct 
in that more sanity checking and documentation re: page alignment would 
be beneficial. I'll submit a v2.


> Otherwise,
> 
> Acked-by: Yongseok Koh <yskoh at mellanox.com>
> Thanks
> 
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* make sure the segment doesn't already exist */
>> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
>> +		rte_errno = EEXIST;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* get next available socket ID */
>> +	socket_id = mcfg->next_socket_id;
>> +	if (socket_id > INT32_MAX) {
>> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
>> +		rte_errno = ENOSPC;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* we can create a new memseg */
>> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
>> +			page_sz, "extmem", socket_id) == NULL) {
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* memseg list successfully created - increment next socket ID */
>> +	mcfg->next_socket_id++;
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	struct rte_memseg_list *msl;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || len == 0) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* find our segment */
>> +	msl = malloc_heap_find_external_seg(va_addr, len);
>> +	if (msl == NULL) {
>> +		rte_errno = ENOENT;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	ret = malloc_heap_destroy_external_seg(msl);
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>>   /* init memory subsystem */
>>   int
>>   rte_eal_memory_init(void)
>> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
>> index d970825df..4a43c1a9e 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -423,6 +423,69 @@ int __rte_experimental
>>   rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>>   		size_t *offset);
>>   
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA mapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to register
>> + * @param len
>> + *   Length of virtual area to register
>> + * @param iova_addrs
>> + *   Array of page IOVA addresses corresponding to each page in this memory
>> + *   area. Can be NULL, in which case page IOVA addresses will be set to
>> + *   RTE_BAD_IOVA.
>> + * @param n_pages
>> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
>> + *   is NULL.
>> + * @param page_sz
>> + *   Page size of the underlying memory
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     EEXIST - memory chunk is already registered
>> + *     ENOSPC - no more space in internal config to store a new memory chunk
>> + */
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA unmapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to unregister
>> + * @param len
>> + *   Length of virtual area to unregister
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     ENOENT - memory chunk was not found
>> + */
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len);
>> +
>>   /**
>>    * Dump the physical memory layout to a file.
>>    *
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 3fe78260d..593691a14 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>>   	rte_devargs_remove;
>>   	rte_devargs_type_count;
>>   	rte_eal_cleanup;
>> +	rte_extmem_register;
>> +	rte_extmem_unregister;
>>   	rte_fbarray_attach;
>>   	rte_fbarray_destroy;
>>   	rte_fbarray_detach;
>> -- 
>> 2.17.1
> 


-- 
Thanks,
Anatoly


More information about the dev mailing list