Failure while allocating 1GB hugepages
Dmitry Kozlyuk
dmitry.kozliuk at gmail.com
Wed Jun 5 00:50:09 CEST 2024
2024-06-03 14:39 (UTC+0200), Antonio Di Bacco:
> Hi,
> I have the same behaviour with the code in this message.
>
> The first rte_memzone_reserve_aligned() call requesting 1.5GB
> contiguous memory always fails, while the second one is always
> successful.
Hi,
I can't explain the "always" part, but unstable behavior comes from
unpredictable IOVA (physical address) that DPDK gets from the kernel.
On the first try:
1. DPDK has no 1G hugepages mapped, it needs 2 more 1G hugepages.
alloc_pages_on_heap() -> eal_memalloc_alloc_seg_bulk()
2. DPDK asks the kernel for one 1G hugepage,
kernel maps the hugepage with IOVA = 0xFC000000,
DPDK stores it in memseg_arr[0].
eal_memalloc_alloc_seg_bulk() -> alloc_seg()
3. Same for another hugepage and memseg_arr[1]->iova = 0xF8000000.
4. DPDK checks is the pages are continuous.
alloc_pages_on_heap() -> eal_memalloc_is_contig() = false
5. Since it's a failure, DPDK frees newly allocated pages.
alloc_pages_on_heap() -> rollback_expand_heap()
On the second try:
6. Steps 1 and 2 repeat, but now memseg_arr[0]->iova = 0xF8000000.
7. Step 3 repeats, but now memseg_arr[0]->iova = 0xFC000000.
8. IOVAs are continuous, success.
Just a wild guess why the second try may be likely to succeed:
memseg_arr[1] with IOVA = 0xF8000000 is freed last at step 5,
so maybe this is why the kernel is likely to reuse this page at step 6.
I'm afraid the simplest way to get PA-continuous 1.5G reliably
is indeed to try several times.
The preferred way is to use IOMMU and IOVA-as-VA if HW permits.
> It seems in eal_memalloc_is_contig() the 'msl->memseg_arr' items are inverted:
> when there is the sequence FC0000000, F80000000 the allocation fails,
> while the segments sequence F80000000, FC0000000 is fine.
> From my understaning 'msl->memseg_arr' comes from
> 'rte_eal_get_configuration()->mem_config;' which is rte_config
> declared in eal_common_config.c
Not quite, msl->memseg_arr content is dynamic, see above.
P.S. One may say, DPDK could do better.
It does have N hugepages occupying a continuous range of IOVA.
DPDK could make them VA-continuous by remapping.
But this would be more work, it still wouldn't be 100% reliable,
and still insecure and inflexible compared to IOMMU.
More information about the users
mailing list