[dpdk-dev] [PATCH v3 06/10] eal: introduce memory management wrappers

Thomas Monjalon thomas at monjalon.net
Thu Apr 16 00:17:33 CEST 2020


14/04/2020 21:44, Dmitry Kozlyuk:
> System meory management is implemented differently for POSIX and

meory -> memory

> Windows. Introduce wrapper functions for operations used across DPDK:
> 
> * rte_mem_map()
>   Create memory mapping for a regular file or a page file (swap).
>   This supports mapping to a reserved memory region even on Windows.
> 
> * rte_mem_unmap()
>   Remove mapping created with rte_mem_map().
> 
> * rte_get_page_size()
>   Obtain default system page size.
> 
> * rte_mem_lock()
>   Make arbitrary-sized memory region non-swappable.
> 
> Wrappers follow POSIX semantics limited to DPDK tasks, but their
> signatures deliberately differ from POSIX ones to be more safe and
> expressive.
> 
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk at gmail.com>
[...]
> +/**
> + * Memory reservation flags.
> + */
> +enum eal_mem_reserve_flags {
> +	/**< Reserve hugepages (support may be limited or missing). */
> +	EAL_RESERVE_HUGEPAGES = 1 << 0,
> +	/**< Fail if requested address is not available. */
> +	EAL_RESERVE_EXACT_ADDRESS = 1 << 1
> +};

Maybe more context is needed to understand the meaning of these flags.
[...]
> -eal_get_virtual_area(void *requested_addr, size_t *size,
> -		size_t page_sz, int flags, int mmap_flags);
> +eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz,
> +	int flags, int mmap_flags);

Is there any change here?

[...]
> + * If @code virt @endcode and @code size @endcode describe a part of the

I am not sure about using @code.
It makes reading from source harder.
Is there a real benefit?

[...]
> +/**
> + * Memory protection flags.
> + */
> +enum rte_mem_prot {
> +	RTE_PROT_READ = 1 << 0,   /**< Read access. */
> +	RTE_PROT_WRITE = 1 << 1,   /**< Write access. */
> +	RTE_PROT_EXECUTE = 1 << 2 /**< Code execution. */
> +};

Alignment of comments would look better :-)

> +
> +/**
> + * Memory mapping additional flags.
> + *
> + * In Linux and FreeBSD, each flag is semantically equivalent
> + * to OS-specific mmap(3) flag with the same or similar name.
> + * In Windows, POSIX and MAP_ANONYMOUS semantics are followed.
> + */

I don't understand this comment.
The flags and meanings are the same no matter the OS, right?

> +enum rte_map_flags {
> +	/** Changes of mapped memory are visible to other processes. */
> +	RTE_MAP_SHARED = 1 << 0,
> +	/** Mapping is not backed by a regular file. */
> +	RTE_MAP_ANONYMOUS = 1 << 1,
> +	/** Copy-on-write mapping, changes are invisible to other processes. */
> +	RTE_MAP_PRIVATE = 1 << 2,
> +	/** Fail if requested address cannot be taken. */
> +	RTE_MAP_FIXED = 1 << 3
> +};
> +
> +/**
> + * OS-independent implementation of POSIX mmap(3)
> + * with MAP_ANONYMOUS Linux/FreeBSD extension.
> + */
> +__rte_experimental
> +void *rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot,
> +	enum rte_map_flags flags, int fd, size_t offset);
> +
> +/**
> + * OS-independent implementation of POSIX munmap(3).
> + */
> +__rte_experimental
> +int rte_mem_unmap(void *virt, size_t size);
> +
> +/**
> + * Get system page size. This function never failes.

failes -> fails

> + *
> + * @return
> + *   Positive page size in bytes.
> + */
> +__rte_experimental
> +int rte_get_page_size(void);
> +
> +/**
> + * Lock region in physical memory and prevent it from swapping.
> + *
> + * @param virt
> + *   The virtual address.
> + * @param size
> + *   Size of the region.
> + * @return
> + *   0 on success, negative on error.
> + *
> + * @note Implementations may require @p virt and @p size to be multiples
> + *       of system page size.
> + * @see rte_get_page_size()
> + * @see rte_mem_lock_page()
> + */
> +__rte_experimental
> +int rte_mem_lock(const void *virt, size_t size);

[...]
> --- /dev/null
> +++ b/lib/librte_eal/unix/eal_memory.c

License and copyright missing.

> @@ -0,0 +1,113 @@
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +#include <rte_errno.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +
> +#include "eal_private.h"
> +
> +static void *
> +mem_map(void *requested_addr, size_t size, int prot, int flags,
> +	int fd, size_t offset)
> +{
> +	void *virt = mmap(requested_addr, size, prot, flags, fd, offset);
> +	if (virt == MAP_FAILED) {
> +		RTE_LOG(ERR, EAL,

Not sure it should be a log level so high.
We could imagine checking a memory map.
What about INFO level?
The real error log will be made by the caller.

> +			"Cannot mmap(%p, 0x%zx, 0x%x, 0x%x, %d, 0x%zx): %s\n",
> +			requested_addr, size, prot, flags, fd, offset,
> +			strerror(errno));
> +		rte_errno = errno;
> +		return NULL;
[...]
> +void *
> +eal_mem_reserve(void *requested_addr, size_t size,
> +	enum eal_mem_reserve_flags flags)
> +{
> +	int sys_flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +
> +#ifdef MAP_HUGETLB
> +	if (flags & EAL_RESERVE_HUGEPAGES)
> +		sys_flags |= MAP_HUGETLB;
> +#endif

If MAP_HUGETLB is false, and flags contain EAL_RESERVE_HUGEPAGES,
I think an error should be returned.

> +	if (flags & EAL_RESERVE_EXACT_ADDRESS)
> +		sys_flags |= MAP_FIXED;
> +
> +	return mem_map(requested_addr, size, PROT_NONE, sys_flags, -1, 0);
> +}
[...]
> +int
> +rte_get_page_size(void)
> +{
> +	return getpagesize();
> +}
> +
> +int
> +rte_mem_lock(const void *virt, size_t size)
> +{
> +	return mlock(virt, size);
> +}

Why don't you replace existing code with these new functions?




More information about the dev mailing list