[dpdk-dev] [PATCH v3 06/10] eal: introduce memory management wrappers

Burakov, Anatoly anatoly.burakov at intel.com
Fri Apr 17 14:43:10 CEST 2020


On 14-Apr-20 8:44 PM, Dmitry Kozlyuk wrote:
> System meory management is implemented differently for POSIX and
> Windows. Introduce wrapper functions for operations used across DPDK:
> 
> * rte_mem_map()
>    Create memory mapping for a regular file or a page file (swap).
>    This supports mapping to a reserved memory region even on Windows.
> 
> * rte_mem_unmap()
>    Remove mapping created with rte_mem_map().
> 
> * rte_get_page_size()
>    Obtain default system page size.
> 
> * rte_mem_lock()
>    Make arbitrary-sized memory region non-swappable.
> 
> Wrappers follow POSIX semantics limited to DPDK tasks, but their
> signatures deliberately differ from POSIX ones to be more safe and
> expressive.
> 
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk at gmail.com>
> ---

<snip>

> +/**
> + * Memory reservation flags.
> + */
> +enum eal_mem_reserve_flags {
> +	/**< Reserve hugepages (support may be limited or missing). */
> +	EAL_RESERVE_HUGEPAGES = 1 << 0,
> +	/**< Fail if requested address is not available. */
> +	EAL_RESERVE_EXACT_ADDRESS = 1 << 1

I *really* don't like this terminology.

In Linux et al., MAP_FIXED is not just "reserve at this exact address". 
MAP_FIXED is actually fairly dangerous if you don't know what you're 
doing, because it will unconditionally unmap any previously mapped 
memory. Also, to my knowledge, a call to MAP_FIXED cannot fail unless 
something went very wrong - it will *not* "fail if requested address is 
not available". We basically use MAP_FIXED because we have already 
mapped that area with MAP_ANONYMOUS previously, so we can guarantee that 
it's safe to call MAP_FIXED.

I would greatly prefer if this was named to better reflect the above. 
EAL_FORCE_RESERVE perhaps? The comment also needs to be adjusted.

> +};
> +
>   /**
>    * Get virtual area of specified size from the OS.
>    *
> @@ -232,8 +243,8 @@ int rte_eal_check_module(const char *module_name);
>   #define EAL_VIRTUAL_AREA_UNMAP (1 << 2)
>   /**< immediately unmap reserved virtual area. */
>   void *
> -eal_get_virtual_area(void *requested_addr, size_t *size,
> -		size_t page_sz, int flags, int mmap_flags);
> +eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz,
> +	int flags, int mmap_flags);
>   
>   /**

<snip>

>   
> +/**
> + * Reserve a region of virtual memory.
> + *
> + * Use eal_mem_free() to free reserved memory.
> + *
> + * @param requested_addr
> + *  A desired reservation address. The system may not respect it.
> + *  NULL means the address will be chosen by the system.
> + * @param size
> + *  Reservation size. Must be a multiple of system page size.
> + * @param flags
> + *  Reservation options.
> + * @returns
> + *  Starting address of the reserved area on success, NULL on failure.
> + *  Callers must not access this memory until remapping it.
> + */
> +void *eal_mem_reserve(void *requested_addr, size_t size,
> +	enum eal_mem_reserve_flags flags);

This seems fairly suspect to me. I know that technically enum is an int, 
but semantically, IIRC an enum value should always contain exactly one 
value - you can't use an enum value like a set of flags.

> +
> +/**
> + * Free memory obtained by eal_mem_reserve() or eal_mem_alloc().
> + *
> + * If @code virt @endcode and @code size @endcode describe a part of the
> + * reserved region, only this part of the region is freed (accurately
> + * up to the system page size). If @code virt @endcode points to allocated
> + * memory, @code size @endcode must match the one specified on allocation.
> + * The behavior is undefined if the memory pointed by @code virt @endcode
> + * is obtained from another source than listed above.
> + *
> + * @param virt

<snip>

> +/**
> + * Memory mapping additional flags.
> + *
> + * In Linux and FreeBSD, each flag is semantically equivalent
> + * to OS-specific mmap(3) flag with the same or similar name.
> + * In Windows, POSIX and MAP_ANONYMOUS semantics are followed.
> + */
> +enum rte_map_flags {
> +	/** Changes of mapped memory are visible to other processes. */
> +	RTE_MAP_SHARED = 1 << 0,
> +	/** Mapping is not backed by a regular file. */
> +	RTE_MAP_ANONYMOUS = 1 << 1,
> +	/** Copy-on-write mapping, changes are invisible to other processes. */
> +	RTE_MAP_PRIVATE = 1 << 2,
> +	/** Fail if requested address cannot be taken. */
> +	RTE_MAP_FIXED = 1 << 3

Again, MAP_FIXED does not behave the way you describe. See above comments.

> +};
> +
> +/**
> + * OS-independent implementation of POSIX mmap(3)
> + * with MAP_ANONYMOUS Linux/FreeBSD extension.
> + */
> +__rte_experimental
> +void *rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot,
> +	enum rte_map_flags flags, int fd, size_t offset);
> +
> +/**
> + * OS-independent implementation of POSIX munmap(3).
> + */
> +__rte_experimental
> +int rte_mem_unmap(void *virt, size_t size);
> +
> +/**
> + * Get system page size. This function never failes.
> + *
> + * @return
> + *   Positive page size in bytes.
> + */
> +__rte_experimental
> +int rte_get_page_size(void);

uint32_t? or maybe uint64_t?

> +
> +/**
> + * Lock region in physical memory and prevent it from swapping.
> + *
> + * @param virt
> + *   The virtual address.
> + * @param size
> + *   Size of the region.
> + * @return
> + *   0 on success, negative on error.
> + *
> + * @note Implementations may require @p virt and @p size to be multiples
> + *       of system page size.
> + * @see rte_get_page_size()
> + * @see rte_mem_lock_page()
> + */
> +__rte_experimental
> +int rte_mem_lock(const void *virt, size_t size);
> +
>   /**
-- 
Thanks,
Anatoly


More information about the dev mailing list