Failure while allocating 1GB hugepages
Dmitry Kozlyuk
dmitry.kozliuk at gmail.com
Fri May 10 17:07:43 CEST 2024
2024-05-10 11:33 (UTC+0200), Antonio Di Bacco:
> I have 16 hugepages available per NUMA on a 4 NUMA system:
>
> [user at node-1 hugepages]$ cat
> /sys/devices/system/node/*/hugepages/hugepages-1048576kB/free_hugepages
> 16
> 16
> 16
> 16
>
> Using the following program with dpdk 21.11, sometimes I can allocate
> a few pages but most of the time I cannot. I tried also to remove
> rtemap_* under /dev/hugepages.
> rte_memzone_reserve_aligned is always supposed to use a new page?
>
> #include <stdio.h>
> #include <rte_eal.h>
> #include <rte_memzone.h>
>
> #include <rte_errno.h>
> #include <unistd.h>
>
> int main(int argc, char **argv)
> {
> const struct rte_memzone *mz;
> int ret;
> printf("pid: %d\n", getpid());
> // Initialize EAL
> ret = rte_eal_init(argc, argv);
> if (ret < 0) {
> fprintf(stderr, "Error with EAL initialization\n");
> return -1;
> }
>
> for (int socket = 0; socket < 4; socket++)
> {
> for (int i = 0; i < 16; i++)
> {
> // Allocate memory using rte_memzone_reserve_aligned
> char name[32];
> sprintf(name, "my_memzone%d-%d", i, socket);
> mz = rte_memzone_reserve_aligned(name, 1ULL << 30, socket,
> RTE_MEMZONE_IOVA_CONTIG, 1ULL << 30);
>
> if (mz == NULL) {
> printf("errno %s\n", rte_strerror(rte_errno));
> fprintf(stderr, "Memory allocation failed\n");
> rte_eal_cleanup();
> return -1;
> }
>
> printf("Memory allocated with name %s at socket %d physical
> address: %p, addr %p addr64 %lx size: %zu\n", name, mz->socket_id,
> (mz->iova), mz->addr, mz->addr_64, mz->len);
> }
> }
>
> // Clean up EAL
> rte_eal_cleanup();
> return 0;
> }
Hi Antonio,
Does it succeed without RTE_MEMZONE_IOVA_CONTIG?
If so, does your system/app have ASLR enabled?
When memzone size is 1G and hugepage size is 1G,
two hugepages are required: one for the requested amount of memory,
and one for memory allocator element header,
which does not fit into the same page obviously.
I suspect that two allocated hugepages get non-continuous IOVA
and that's why the function fails.
There are no useful logs in EAL to check the suspicion,
but you can hack elem_check_phys_contig() in malloc_elem.c.
More information about the users
mailing list