[PATCH v2 1/6] eal: add static per-lcore memory allocation facility

Mattias Rönnblom hofors at lysator.liu.se
Tue Oct 15 08:29:19 CEST 2024


On 2024-10-14 09:56, Morten Brørup wrote:
>> From: Jerin Jacob [mailto:jerinjacobk at gmail.com]
>> Sent: Wednesday, 18 September 2024 12.12
>>
>> On Thu, Sep 12, 2024 at 8:52 PM Jerin Jacob <jerinjacobk at gmail.com>
>> wrote:
>>>
>>> On Thu, Sep 12, 2024 at 7:11 PM Morten Brørup
>> <mb at smartsharesystems.com> wrote:
>>>>
>>>>> From: Jerin Jacob [mailto:jerinjacobk at gmail.com]
>>>>> Sent: Thursday, 12 September 2024 15.17
>>>>>
>>>>> On Thu, Sep 12, 2024 at 2:40 PM Morten Brørup
>> <mb at smartsharesystems.com>
>>>>> wrote:
>>>>>>
>>>>>>> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR *
>> RTE_MAX_LCORE)
>>>>>>
>>>>>> Considering hugepages...
>>>>>>
>>>>>> Lcore variables may be allocated before DPDK's memory allocator
>>>>> (rte_malloc()) is ready, so rte_malloc() cannot be used for lcore
>> variables.
>>>>>>
>>>>>> And lcore variables are not usable (shared) for DPDK multi-
>> process, so the
>>>>> lcore_buffer could be allocated through the O/S APIs as anonymous
>> hugepages,
>>>>> instead of using rte_malloc().
>>>>>>
>>>>>> The alternative, using rte_malloc(), would disallow allocating
>> lcore
>>>>> variables before DPDK's memory allocator has been initialized,
>> which I think
>>>>> is too late.
>>>>>
>>>>> I thought it is not. A lot of the subsystems are initialized
>> after the
>>>>> memory subsystem is initialized.
>>>>> [1] example given in documentation. I thought, RTE_INIT needs to
>>>>> replaced if the subsystem called after memory initialized (which
>> is
>>>>> the case for most of the libraries)
>>>>
>>>> The list of RTE_INIT functions are called before main(). It is not
>> very useful.
>>>>
>>>> Yes, it would be good to replace (or supplement) RTE_INIT_PRIO by
>> something similar, which calls the list of "INIT" functions at the
>> appropriate time during EAL initialization.
>>>>
>>>> DPDK should then use this "INIT" list for all its initialization,
>> so the init function of new features (such as this, and trace) can be
>> inserted at the correct location in the list.
>>>>
>>>>> Trace library had a similar situation. It is managed like [2]
>>>>
>>>> Yes, if we insist on using rte_malloc() for lcore variables, the
>> alternative is to prohibit establishing lcore variables in functions
>> called through RTE_INIT.
>>>
>>> I was not insisting on using ONLY rte_malloc(). Since rte_malloc()
>> can
>>> be called before rte_eal_init)(it will return NULL). Alloc routine
>> can
>>> check first rte_malloc() is available if not switch over glibc.
>>
>>
>> @Mattias Rönnblom This comment is not addressed in v7. Could you check?
> 
> Mattias, following up on Jerin's suggestion:
> 
> When allocating an lcore variable, and the buffer holding lcore variables is out of space (or was never allocated), a new buffer is allocated.
> 
> Here's the twist I think Jerin is asking for:
> You could check if rte_malloc() is available, and use that (instead of the heap) when allocating a new buffer holding lcore variables.
> This check can be performed (aggressively) when allocating a new lcore variable, or (conservatively) only when allocating a new buffer.
> 
> 
> Now, if using hugepages, the value of RTE_MAX_LCORE_VAR (the maximum size of one lcore variable instance) becomes more important.
> 
> Let's consider systems with 2 MB hugepages:
> 
> If it supports two lcores (RTE_MAX_LCORE is 2), the current RTE_MAX_LCORE_VAR default of 1 MB is a perfect match; it will use 2 MB of RAM as one 2 MB hugepage.
> 
> If it supports 128 lcores, the current RTE_MAX_LCORE_VAR default of 1 MB will use 128 MB of RAM.
> 
> If we scale it back, so it only uses one 2 MB hugepage, RTE_MAX_LCORE_VAR will have to be 2 MB / 128 lcores = 16 KB.
> 16 KB might be too small. E.g. a mempool cache uses 2 * 512 * sizeof(void *) = 8 KB + a few bytes for the information about the cache. So I can easily point at one example where 16 KB is going very close to the edge.
> 
> So, as you already asked, what is a reasonable default minimum value of RTE_MAX_LCORE_VAR?
> 
> Maybe we should just stick with your initial suggestion (1 MB) and see how it goes.
> 

Sure. Let's stick with 1 MB.

I'm guessing that if/when someone takes a closer look how to do 
per-lcore *dynamic* allocations, this API and its implementation will be 
revisited as well.

> 
> <roadmap>
> At the recent DPDK Summit, we discussed memory consumption in one of the workshops.
> One of the possible means for reducing memory consumption is making RTE_MAX_LCORE dynamic, so an application using only a few cores will scale its per-lcore tables to the actual number of lcores, instead of scaling to some hardcoded maximum.
> 
> With this in mind, I'm less worried about the RTE_MAX_LCORE multiplier.
> </roadmap>
> 

A interesting hack would be disable huge page usage, set up a swap file 
in a zram device, and then MADV_PAGEOUT the DPDK process after startup.

I wonder how much smaller DPDK process RSS would be, when it had paged 
back in all the pages that were actually required.



More information about the dev mailing list