[dpdk-dev] [PATCH v6 2/3] eal: add memory pre-allocation from existing files

David Marchand david.marchand at redhat.com
Tue Oct 12 19:32:49 CEST 2021


On Tue, Oct 12, 2021 at 5:55 PM Dmitry Kozlyuk <dkozlyuk at nvidia.com> wrote:
> > I have some trouble figuring the need for the list of files.
> > Why not use a global knob --mem-clear-on-alloc for this behavior change?
>
> Moving memset() doesn't speed anything up, it's a forced step for the reasons below.
> Currently, memory is cleared by the kernel when a page is mapped during an allocation.
> This cannot be turned off in stock kernels. The issue is that initial allocations are longer
> by the time needed to clear the pages, which is >90%. For the memory intended for DMA this time is just wasted. If allocations are large, application startup and restart take long. The only way to get hugepages mapped without the kernel clearing them is to map existing files in hugetlbfs. However, rte_zmalloc() needs to return clean memory, that's why we move memset() there. Memory intended for DMA is just never cleared this way. But memory freed and allocated again will be cleared again, unfortunately.

Writing my limited understanding, please correct me.

The --mem-file that is proposed does:
- preallocate files which is something close to --socket-mem with the
following differences
  - --mem-file lets user decide on dpdk hugepage files names, which I
think conflicts with --huge-dir and --file-prefix,
  - --mem-file lets user device on hugepage size which I think could
be achieved with some --huge-dir option,
- bypasses unlink() of existing hugepage files which I had overlooked
but is the main painpoint,
- enforces "clear on alloc" in rte_malloc/rte_free.


>From this, I see two parts in this patch:
- faster restart, reusing hugepage files as is (combination of not
calling unlink() and doing "clear on alloc"),
  This part is interesting, and I think a single knob for this would be enough.
- finegrained control of hugepage files, but it has the drawback of
imposing primary/secondary run with the same options.
  The second part seems complex to configure. I see conflicts with
existing options, so it seems a good way to get caught up in the
carpet (sorry if it translates badly from French :p).


-- 
David Marchand



More information about the dev mailing list