[PATCH v13 1/7] eal: add static per-lcore memory allocation facility

Mattias Rönnblom hofors at lysator.liu.se
Wed Oct 16 14:47:26 CEST 2024
Previous message (by thread): [PATCH v13 1/7] eal: add static per-lcore memory allocation facility
Next message (by thread): [PATCH v13 1/7] eal: add static per-lcore memory allocation facility
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2024-10-16 10:17, Thomas Monjalon wrote:
> 16/10/2024 06:13, Mattias Rönnblom:
>>
>> On 2024-10-16 00:33, Stephen Hemminger wrote:
>>> On Tue, 15 Oct 2024 11:33:38 +0200
>>> Mattias Rönnblom <mattias.ronnblom at ericsson.com> wrote:
>>>
>>>> + * Lcore variables
>>>> + *
>>>> + * This API provides a mechanism to create and access per-lcore id
>>>> + * variables in a space- and cycle-efficient manner.
>>>> + *
>>>> + * A per-lcore id variable (or lcore variable for short) holds a
>>>> + * unique value for each EAL thread and registered non-EAL
>>>> + * thread. There is one instance for each current and future lcore
>>>> + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The
>>>> + * value of the lcore variable for one lcore id is independent from
>>>> + * the values assigned to other lcore ids within the same variable.
>>>> + *
>>>> + * In order to access the values of an lcore variable, a handle is
>>>> + * used. The type of the handle is a pointer to the value's type
>>>> + * (e.g., for an @c uint32_t lcore variable, the handle is a
>>>> + * <code>uint32_t *</code>). The handle type is used to inform the
>>>> + * access macros of the type of the values. A handle may be passed
>>>> + * between modules and threads just like any pointer, but its value
>>>> + * must be treated as an opaque identifier. An allocated handle never
>>>> + * has the value NULL.
>>>> + *
>>>> + * @b Creation
>>>> + *
>>>> + * An lcore variable is created in two steps:
>>>> + *  1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
>>>> + *  2. Allocate lcore variable storage and initialize the handle with
>>>> + *     a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
>>>> + *     @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time
>>>> + *     of module initialization, but may be done at any time.
>>>> + *
>>>> + * The lifetime of an lcore variable is not tied to the thread that
>>>> + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are
>>>> + * available from the moment the lcore variable is created and
>>>> + * continue to exist throughout the entire lifetime of the EAL,
>>>> + * whether or not the lcore id is currently in use.
>>>> + *
>>>> + * Lcore variables cannot and need not be freed.
>>>> + *
>>>> + * @b Access
>>>> + *
>>>> + * The value of any lcore variable for any lcore id may be accessed
>>>> + * from any thread (including unregistered threads), but it should
>>>> + * only be *frequently* read from or written to by the owner.
>>>> + *
>>>> + * Values of the same lcore variable, associated with different lcore
>>>> + * ids may be frequently read or written by their respective owners
>>>> + * without risking false sharing.
>>>> + *
>>>> + * An appropriate synchronization mechanism (e.g., atomic loads and
>>>> + * stores) should be employed to prevent data races between the owning
>>>> + * thread and any other thread accessing the same value instance.
>>>> + *
>>>> + * The value of the lcore variable for a particular lcore id is
>>>> + * accessed using @ref RTE_LCORE_VAR_LCORE.
>>>> + *
>>>> + * A common pattern is for an EAL thread or a registered non-EAL
>>>> + * thread to access its own lcore variable value. For this purpose, a
>>>> + * shorthand exists as @ref RTE_LCORE_VAR.
>>>> + *
>>>> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
>>>> + * pointer with the same type as the value, it may not be directly
>>>> + * dereferenced and must be treated as an opaque identifier.
>>>> + *
>>>> + * Lcore variable handles and value pointers may be freely passed
>>>> + * between different threads.
>>>> + *
>>>> + * @b Storage
>>>> + *
>>>> + * An lcore variable's values may be of a primitive type like @c int,
>>>> + * but would more typically be a @c struct.
>>>> + *
>>>> + * The lcore variable handle introduces a per-variable (not
>>>> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
>>>> + * there are some memory footprint gains to be made by organizing all
>>>> + * per-lcore id data for a particular module as one lcore variable
>>>> + * (e.g., as a struct).
>>>> + *
>>>> + * An application may define an lcore variable handle without ever
>>>> + * allocating it.
>>>> + *
>>>> + * The size of an lcore variable's value must be less than the DPDK
>>>> + * build-time constant @c RTE_MAX_LCORE_VAR.
>>>> + *
>>>> + * Lcore variables are stored in a series of lcore buffers, which are
>>>> + * allocated from the libc heap. Heap allocation failures are treated
>>>> + * as fatal.
>>>> + *
>>>> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
>>>> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
>>>> + * of these constructs are designed to avoid false sharing. In the
>>>> + * case of an lcore variable instance, the thread most recently
>>>> + * accessing nearby data structures should almost-always be the lcore
>>>> + * variable's owner. Adding padding will increase the effective memory
>>>> + * working set size, potentially reducing performance.
>>>> + *
>>>> + * Lcore variable values are initialized to zero by default.
>>>> + *
>>>> + * Lcore variables are not stored in huge page memory.
>>>> + *
>>>> + * @b Example
>>>> + *
>>>> + * Below is an example of the use of an lcore variable:
>>>> + *
>>>> + * @code{.c}
>>>> + * struct foo_lcore_state {
>>>> + *         int a;
>>>> + *         long b;
>>>> + * };
>>>> + *
>>>> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
>>>> + *
>>>> + * long foo_get_a_plus_b(void)
>>>> + * {
>>>> + *         struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
>>>> + *
>>>> + *         return state->a + state->b;
>>>> + * }
>>>> + *
>>>> + * RTE_INIT(rte_foo_init)
>>>> + * {
>>>> + *         RTE_LCORE_VAR_ALLOC(lcore_states);
>>>> + *
>>>> + *         unsigned int lcore_id;
>>>> + *         struct foo_lcore_state *state;
>>>> + *         RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
>>>> + *                 (initialize 'state')
>>>> + *         }
>>>> + *
>>>> + *         (other initialization)
>>>> + * }
>>>> + * @endcode
>>>> + *
>>>> + *
>>>> + * @b Alternatives
>>>> + *
>>>> + * Lcore variables are designed to replace a pattern exemplified below:
>>>> + * @code{.c}
>>>> + * struct __rte_cache_aligned foo_lcore_state {
>>>> + *         int a;
>>>> + *         long b;
>>>> + *         RTE_CACHE_GUARD;
>>>> + * };
>>>> + *
>>>> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
>>>> + * @endcode
>>>> + *
>>>> + * This scheme is simple and effective, but has one drawback: the data
>>>> + * is organized so that objects related to all lcores for a particular
>>>> + * module are kept close in memory. At a bare minimum, this requires
>>>> + * sizing data structures (e.g., using `__rte_cache_aligned`) to an
>>>> + * even number of cache lines to avoid false sharing. With CPU
>>>> + * hardware prefetching and memory loads resulting from speculative
>>>> + * execution (functions which seemingly are getting more eager faster
>>>> + * than they are getting more intelligent), one or more "guard" cache
>>>> + * lines may be required to separate one lcore's data from another's
>>>> + * and prevent false sharing.
>>>> + *
>>>> + * Lcore variables offer the advantage of working with, rather than
>>>> + * against, the CPU's assumptions. A next-line hardware prefetcher,
>>>> + * for example, may function as intended (i.e., to the benefit, not
>>>> + * detriment, of system performance).
>>>> + *
>>>> + * Another alternative to @ref rte_lcore_var.h is the @ref
>>>> + * rte_per_lcore.h API, which makes use of thread-local storage (TLS,
>>>> + * e.g., GCC __thread or C11 _Thread_local). The main differences
>>>> + * between by using the various forms of TLS (e.g., @ref
>>>> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
>>>> + * variables are:
>>>> + *
>>>> + *   * The lifecycle of a thread-local variable instance is tied to
>>>> + *     that of the thread. The data cannot be accessed before the
>>>> + *     thread has been created, nor after it has exited. As a result,
>>>> + *     thread-local variables must be initialized in a "lazy" manner
>>>> + *     (e.g., at the point of thread creation). Lcore variables may be
>>>> + *     accessed immediately after having been allocated (which may occur
>>>> + *     before any thread beyond the main thread is running).
>>>> + *   * A thread-local variable is duplicated across all threads in the
>>>> + *     process, including unregistered non-EAL threads (i.e.,
>>>> + *     "regular" threads). For DPDK applications heavily relying on
>>>> + *     multi-threading (in conjunction to DPDK's "one thread per core"
>>>> + *     pattern), either by having many concurrent threads or
>>>> + *     creating/destroying threads at a high rate, an excessive use of
>>>> + *     thread-local variables may cause inefficiencies (e.g.,
>>>> + *     increased thread creation overhead due to thread-local storage
>>>> + *     initialization or increased total RAM footprint usage). Lcore
>>>> + *     variables *only* exist for threads with an lcore id.
>>>> + *   * If data in thread-local storage may be shared between threads
>>>> + *     (i.e., can a pointer to a thread-local variable be passed to
>>>> + *     and successfully dereferenced by non-owning thread) depends on
>>>> + *     the specifics of the TLS implementation. With GCC __thread and
>>>> + *     GCC _Thread_local, data sharing between threads is supported.
>>>> + *     In the C11 standard, accessing another thread's _Thread_local
>>>> + *     object is implementation-defined. Lcore variable instances may
>>>> + *     be accessed reliably by any thread.
>>>> + */
>>>
>>> For me this comment too wordy for code and belongs in the documentation instead.
>>> Could also be reduced to more precise succinct language.
> 
> I agree, this is what I was asking for.
> 
> 
>> Provided this makes it into RC1, I can move most of this and some of the
>> information in eal_common_lcore_var.c comments into "the documentation"
>> as a RC2 patch.
>>
>> If "the documentation" is a the EAL programmer's guide, a description of
>> lcore variables (with pictures!) in sufficient detail (both API and
>> implementation) would make up a large fraction of it. That would look
>> silly and in the way of more important things. Lcore variables is just a
>> tiny bit of infrastructure. Other, more central EAL features, like the
>> RTE spinlock, they have no mention at all in the EAL docs.
> 
> Please don't take what exists and not exists as an absolute model.
> We must improve the doc, split it better and fill the gaps.
> In the meantime we want new features like this one to be properly documented.
> 

I don't have an issue with raising the bar for new features.

> 
>> Another option I suppose is to documentation it separately from the
>> "main" EAL programmer's guide, but - correct me if I'm wrong here -
>> there seem to be no precedent for doing this.
> 
> For instance, the services cores are a separate chapter of the prog guide.

Right, forgot about the service cores. I will follow that model.

> The lcore variables should be a separate chapter as well.
>
Previous message (by thread): [PATCH v13 1/7] eal: add static per-lcore memory allocation facility
Next message (by thread): [PATCH v13 1/7] eal: add static per-lcore memory allocation facility
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list