[dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas

Yongseok Koh yskoh at mellanox.com
Fri Dec 14 10:55:40 CET 2018


On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
> The general use-case of using external memory is well covered by
> existing external memory API's. However, certain use cases require
> manual management of externally allocated memory areas, so this
> memory should not be added to the heap. It should, however, be
> added to DPDK's internal structures, so that API's like
> ``rte_virt2memseg`` would work on such external memory segments.
> 
> This commit adds such an API to DPDK. The new functions will allow
> to register and unregister externally allocated memory areas, as
> well as documentation for them.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
>  .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>  lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>  lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>  lib/librte_eal/rte_eal_version.map            |  2 +
>  4 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 8b5d050c7..d7799b626 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>  Support for Externally Allocated Memory
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> -It is possible to use externally allocated memory in DPDK, using a set of malloc
> -heap API's. Support for externally allocated memory is implemented through
> -overloading the socket ID - externally allocated heaps will have socket ID's
> -that would be considered invalid under normal circumstances. Requesting an
> -allocation to take place from a specified externally allocated memory is a
> -matter of supplying the correct socket ID to DPDK allocator, either directly
> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
> -structure-specific allocation API's such as ``rte_ring_create``).
> +It is possible to use externally allocated memory in DPDK. There are two ways in
> +which using externally allocated memory can work: the malloc heap API's, and
> +manual memory management.
>  
> -Since there is no way DPDK can verify whether memory are is available or valid,
> -this responsibility falls on the shoulders of the user. All multiprocess
> ++ Using heap API's for externally allocated memory
> +
> +Using using a set of malloc heap API's is the recommended way to use externally
> +allocated memory in DPDK. In this way, support for externally allocated memory
> +is implemented through overloading the socket ID - externally allocated heaps
> +will have socket ID's that would be considered invalid under normal
> +circumstances. Requesting an allocation to take place from a specified
> +externally allocated memory is a matter of supplying the correct socket ID to
> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
> +indirectly (through data structure-specific allocation API's such as
> +``rte_ring_create``). Using these API's also ensures that mapping of externally
> +allocated memory for DMA is also performed on any memory segment that is added
> +to a DPDK malloc heap.
> +
> +Since there is no way DPDK can verify whether memory is available or valid, this
> +responsibility falls on the shoulders of the user. All multiprocess
>  synchronization is also user's responsibility, as well as ensuring  that all
>  calls to add/attach/detach/remove memory are done in the correct order. It is
>  not required to attach to a memory area in all processes - only attach to memory
> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>  For more information, please refer to ``rte_malloc`` API documentation,
>  specifically the ``rte_malloc_heap_*`` family of function calls.
>  
> ++ Using externally allocated memory without DPDK API's
> +
> +While using heap API's is the recommended method of using externally allocated
> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
> +is undesirable - for example, when manual memory management is performed on an
> +externally allocated area. To support use cases where externally allocated
> +memory will not be used as part of normal DPDK workflow, there is also another
> +set of API's under the ``rte_extmem_*`` namespace.
> +
> +These API's are (as their name implies) intended to allow registering or
> +unregistering externally allocated memory to/from DPDK's internal page table, to
> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
> +memory. Memory added this way will not be available for any regular DPDK
> +allocators; DPDK will leave this memory for the user application to manage.
> +
> +The expected workflow is as follows:
> +
> +* Get a pointer to memory area
> +* Register memory within DPDK
> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
> +      unavailable
> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> +* Use the memory area in your application
> +* If memory area is no longer needed, it can be unregistered
> +    - If the area was mapped for DMA, unmapping must be performed before
> +      unregistering memory
> +
> +Since these externally allocated memory areas will not be managed by DPDK, it is
> +therefore up to the user application to decide how to use them and what to do
> +with them once they're registered.
> +
>  Per-lcore and Shared Variables
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index d47ea4938..a2e085ae8 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -24,6 +24,7 @@
>  #include "eal_memalloc.h"
>  #include "eal_private.h"
>  #include "eal_internal_cfg.h"
> +#include "malloc_heap.h"
>  
>  /*
>   * Try to mmap *size bytes in /dev/zero. If it is successful, return the
> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>  	return ret;
>  }
>  
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	unsigned int socket_id;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
> +			!rte_is_power_of_2(page_sz) ||
> +			RTE_ALIGN(len, page_sz) != len) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}

Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
have it either... Also you might want to add it to documentation that
granularity of these registrations is a page.

Otherwise,

Acked-by: Yongseok Koh <yskoh at mellanox.com>
Thanks

> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* make sure the segment doesn't already exist */
> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
> +		rte_errno = EEXIST;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* get next available socket ID */
> +	socket_id = mcfg->next_socket_id;
> +	if (socket_id > INT32_MAX) {
> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
> +		rte_errno = ENOSPC;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* we can create a new memseg */
> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
> +			page_sz, "extmem", socket_id) == NULL) {
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* memseg list successfully created - increment next socket ID */
> +	mcfg->next_socket_id++;
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	struct rte_memseg_list *msl;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* find our segment */
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
> +		rte_errno = ENOENT;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	ret = malloc_heap_destroy_external_seg(msl);
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
>  /* init memory subsystem */
>  int
>  rte_eal_memory_init(void)
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> index d970825df..4a43c1a9e 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -423,6 +423,69 @@ int __rte_experimental
>  rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>  		size_t *offset);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Register external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA mapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to register
> + * @param len
> + *   Length of virtual area to register
> + * @param iova_addrs
> + *   Array of page IOVA addresses corresponding to each page in this memory
> + *   area. Can be NULL, in which case page IOVA addresses will be set to
> + *   RTE_BAD_IOVA.
> + * @param n_pages
> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
> + *   is NULL.
> + * @param page_sz
> + *   Page size of the underlying memory
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     EEXIST - memory chunk is already registered
> + *     ENOSPC - no more space in internal config to store a new memory chunk
> + */
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Unregister external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA unmapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to unregister
> + * @param len
> + *   Length of virtual area to unregister
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     ENOENT - memory chunk was not found
> + */
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len);
> +
>  /**
>   * Dump the physical memory layout to a file.
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 3fe78260d..593691a14 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_eal_cleanup;
> +	rte_extmem_register;
> +	rte_extmem_unregister;
>  	rte_fbarray_attach;
>  	rte_fbarray_destroy;
>  	rte_fbarray_detach;
> -- 
> 2.17.1


More information about the dev mailing list