[RFC v3 1/6] eal: add static per-lcore memory allocation facility

Mattias Rönnblom hofors at lysator.liu.se
Tue Feb 20 17:26:17 CET 2024


On 2024-02-20 12:39, Bruce Richardson wrote:
> On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias Rönnblom wrote:
>> On 2024-02-20 10:11, Bruce Richardson wrote:
>>> On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote:
>>>> Introduce DPDK per-lcore id variables, or lcore variables for short.
>>>>
>>>> An lcore variable has one value for every current and future lcore
>>>> id-equipped thread.
>>>>
>>>> The primary <rte_lcore_var.h> use case is for statically allocating
>>>> small chunks of often-used data, which is related logically, but where
>>>> there are performance benefits to reap from having updates being local
>>>> to an lcore.
>>>>
>>>> Lcore variables are similar to thread-local storage (TLS, e.g., C11
>>>> _Thread_local), but decoupling the values' life time with that of the
>>>> threads.
> 
> <snip>
> 
>>>> +/*
>>>> + * Avoid using offset zero, since it would result in a NULL-value
>>>> + * "handle" (offset) pointer, which in principle and per the API
>>>> + * definition shouldn't be an issue, but may confuse some tools and
>>>> + * users.
>>>> + */
>>>> +#define INITIAL_OFFSET 1
>>>> +
>>>> +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR] __rte_cache_aligned;
>>>> +
>>>
>>> While I like the idea of improved handling for per-core variables, my main
>>> concern with this set is this definition here, which adds yet another
>>> dependency on the compile-time defined RTE_MAX_LCORE value.
>>>
>>
>> lcore variables replaces one RTE_MAX_LCORE-dependent pattern with another.
>>
>> You could even argue the dependency on RTE_MAX_LCORE is reduced with lcore
>> variables, if you look at where/in how many places in the code base this
>> macro is being used. Centralizing per-lcore data management may also provide
>> some opportunity in the future for extending the API to cope with some more
>> dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course, but we
>> are not ever going to change anything related to RTE_MAX_LCORE without
>> breaking the ABI, since this constant is everywhere, including compiled into
>> the application itself.
>>
> 
> Yep, that is true if it's widely used.
> 
>>> I believe we already have an issue with this #define where it's impossible
>>> to come up with a single value that works for all, or nearly all cases. The
>>> current default is still 128, yet DPDK needs to support systems where the
>>> number of cores is well into the hundreds, requiring workarounds of core
>>> mappings or customized builds of DPDK. Upping the value fixes those issues
>>> at the cost to memory footprint explosion for smaller systems.
>>>
>>
>> I agree this is an issue.
>>
>> RTE_MAX_LCORE also need to be sized to accommodate not only all cores used,
>> but the sum of all EAL threads and registered non-EAL threads.
>>
>> So, there is no reliable way to discover what RTE_MAX_LCORE is on a
>> particular piece of hardware, since the actual number of lcore ids needed is
>> up to the application.
>>
>> Why is the default set so low? Linux has MAX_CPUS, which serves the same
>> purpose, which is set to 4096 by default, if I recall correctly. Shouldn't
>> we at least be able to increase it to 256?
> 
> The default is so low because of the mempool caches. These are an array of
> buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE.
> 
>>
>>> I'm therefore nervous about putting more dependencies on this value, when I
>>> feel we should be moving away from its use, to allow more runtime
>>> configurability of cores.
>>>
>>
>> What more specifically do you have in mind?
>>
> 
> I don't think having a dynamically scaling RTE_MAX_LCORE is feasible, but
> what I would like to see is a runtime specified value. For example, you
> could run DPDK with EAL parameter "--max-lcores=1024" for large systems or
> "--max-lcores=32" for small ones. That would then be used at init-time to
> scale all internal datastructures appropriately.
> 

Sounds reasonably to me, especially if you would take gradual approach.

By gradual I mean something like adding a function 
rte_lcore_max_possible(), or something like that, returning the EAL 
init-specified value. DPDK libraries/PMDs could then gradually be made 
aware and taking advantage of knowing that lcore ids will always be 
below a certain threshold, usually significantly lower than RTE_MAX_LCORE.

The only change required for lcore variables would be that the FOREACH 
macro would use the run-time-max value, rather than RTE_MAX_LCORE, which 
in turn would leave all the higher-numbered lcore id buffers 
untouched/unmapped.

The set of possible lcore ids could also be expressed as a bitset, if 
you have machine with a huge amount of cores, running many small DPDK 
instances.

> /Bruce
> 
> <snip for brevity>


More information about the dev mailing list