[RFC] random: use per lcore state
Mattias Rönnblom
hofors at lysator.liu.se
Mon Sep 11 11:00:46 CEST 2023
On 2023-09-09 13:23, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
>> Sent: Saturday, 9 September 2023 08.45
>>
>> On 2023-09-09 02:13, Konstantin Ananyev wrote:
>>> 06/09/2023 21:02, Mattias Rönnblom пишет:
>>>> On 2023-09-06 19:20, Stephen Hemminger wrote:
>>>>> Move the random number state into thread local storage.
>>>>
>>>> Me and Morten discussed TLS versus other alternatives in some other
>>>> thread. The downside of TLS that Morten pointed out, from what I
>>>> recall, is that lazy initialization is *required* (since the number
>> of
>>>> threads is open-ended), and the data ends up in non-huge page memory.
>>>
>>> Hmm.. correct me if I am wrong, but with current implementation,
>>> rand state is also in non-huge memory:
>>> static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
>>>
>>
>> Yes. The current pattern is certainly not perfect.
>>
>>>
>>>> It was also unclear to me what the memory footprint implications
>> would
>>>> be,h would large per-lcore data structures be put in TLS. More
>>>> specifically, if they would be duplicated across all threads, even
>>>> non-lcore threads.
>>>>
>>>> None of these issues affect rte_random.c's potential usage of TLS
>>>> (except lazy [re-]initialization makes things more complicated).
>>>>
>>>> Preferably, there should be one pattern that is usable across all or
>>>> at least most DPDK modules requiring per-lcore state.
>>>>
>>>>> This has a several benefits.
>>>>> - no false cache sharing from cpu prefetching
>>>>> - fixes initialization of random state for non-DPDK threads
>>>>
>>>> This seems like a non-reason to me. That bug is easily fixed, if it
>>>> isn't already.
>>>>
>>>>> - fixes unsafe usage of random state by non-DPDK threads
>>>>>
>>>>
>>>> "Makes random number generation MT safe from all threads (including
>>>> unregistered non-EAL threads)."
>>>>
>>>> With current API semantics you may still register an non-EAL thread,
>>>> to get MT safe access to this API, so I guess it's more about being
>>>> more convenient and less error prone, than anything else.
>>>
>>> I understand that we never guaranteed MT safety for non-EAL threads
>> here,
>>
>>
>> Registered non-EAL threads have a lcore id and thus may safely call
>> rte_rand(). Multiple unregistered non-EAL threads may not do so, in
>> parallel.
>>
>>
>>> but as a user of rte_rand() - it would be much more convenient, if I
>> can
>>> use it
>>> from any thread wthout worring is it a EAL thread or not.
>>
>> Sure, especially if it comes for free. The for-free solution has yet to
>> reveal itself though.
>
> We could offer re-entrant function variants for non-EAL threads:
>
> uint64_t rte_rand_r(struct rte_rand_state * const state);
> void rte_srand_r(struct rte_rand_state * const state, uint64_t seed);
> uint64_t rte_rand_max_r(struct rte_rand_state * const state, uint64_t upper_bound);
> double rte_drand_r(struct rte_rand_state * const state, void);
>
> For this to work, we would have to make struct rte_rand_state public, and the application would need to allocate it. (At least one instance per thread that uses it, obviously.)
>
Yes, and that will come at a pretty severe API complexity cost.
Besides the obvious complexities, it may also lead the user to believe
the rte_rand() is not MT safe for any thread, since that's how it works
in glibc (rand() versus rand_r()).
>>
>>>
>>> About TlS usage and re-seeding - can we use some sort of middle-
>> ground:
>>> extend rte_rand_state with some gen-counter.
>>> Make a 'master' copy of rte_rand_state that will be updated by
>> rte_srand(),
>>> and TLS copies of rte_rand_state, so rte_rand() can fist compare
>>> its gen-counter value with master copy to decide,
>>> does it need to copy new state from master or not.
>>>
>>
>> Calling threads shouldn't all produce the same sequence. That would be
>> silly and not very random. The generation number should be tied to the
>> seed.
>
> I previously thought about seeding...
>
> We are trying to be random, we are not explicitly pseudo-random.
>
> So I came to the conclusion that the ability to reproduce data (typically for verification purposes) is not a requirement here.
>
>>
>>>
>>>> The new MT safety guarantees should be in the API docs as well.
>>>
>>> Yes, it is an extension to the current API, not a fix.
>>>
>>>>
>>>>> The initialization of random number state is done by the
>>>>> lcore (lazy initialization).
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
>>>>> ---
>>>>> lib/eal/common/rte_random.c | 38 +++++++++++++++++++--------------
>> ----
>>>>> 1 file changed, 20 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/lib/eal/common/rte_random.c
>> b/lib/eal/common/rte_random.c
>>>>> index 53636331a27b..9657adf6ad3b 100644
>>>>> --- a/lib/eal/common/rte_random.c
>>>>> +++ b/lib/eal/common/rte_random.c
>>>>> @@ -19,13 +19,14 @@ struct rte_rand_state {
>>>>> uint64_t z3;
>>>>> uint64_t z4;
>>>>> uint64_t z5;
>>>>> -} __rte_cache_aligned;
>>>>> + uint64_t seed;
>>>>> +};
>>>>> -/* One instance each for every lcore id-equipped thread, and one
>>>>> - * additional instance to be shared by all others threads (i.e.,
>> all
>>>>> - * unregistered non-EAL threads).
>>>>> - */
>>>>> -static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
>>>>> +/* Global random seed */
>>>>> +static uint64_t rte_rand_seed;
>>>>> +
>>>>> +/* Per lcore random state. */
>>>>> +static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state);
>>>>> static uint32_t
>>>>> __rte_rand_lcg32(uint32_t *seed)
>>>>> @@ -81,11 +82,7 @@ __rte_srand_lfsr258(uint64_t seed, struct
>>>>> rte_rand_state *state)
>>>>> void
>>>>> rte_srand(uint64_t seed)
>>>>> {
>>>>> - unsigned int lcore_id;
>>>>> -
>>>>> - /* add lcore_id to seed to avoid having the same sequence */
>>>>> - for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
>>>>> - __rte_srand_lfsr258(seed + lcore_id,
>> &rand_states[lcore_id]);
>>>>> + __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED);
>>>>> }
>>>>> static __rte_always_inline uint64_t
>>>>> @@ -119,15 +116,18 @@ __rte_rand_lfsr258(struct rte_rand_state
>> *state)
>>>>> static __rte_always_inline
>>>>> struct rte_rand_state *__rte_rand_get_state(void)
>>>>> {
>>>>> - unsigned int idx;
>>>>> + struct rte_rand_state *rand_state =
>> &RTE_PER_LCORE(rte_rand_state);
>>>>
>>>> There should really be a RTE_PER_THREAD, an alias to RTE_PER_LCORE,
>> to
>>>> cover this usage. Or just use __thread (or _Thread_local?).
>>>>
>>>>> + uint64_t seed;
>>>>> - idx = rte_lcore_id();
>>>>> + seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED);
>>>>> + if (unlikely(seed != rand_state->seed)) {
>>>>> + rand_state->seed = seed;
>>>>
>>>> Re-seeding should restart the series, on all lcores. There's nothing
>>>> preventing the user from re-seeding the machinery repeatedly, with
>> the
>>>> same seed. Seems like an unusual, but still valid, use case, if you
>>>> run repeated tests of some sort.
>>>>
>>>> Use a seqlock? :) I guess you need a seed generation number as well
>>>> (e.g., is this the first time you seed with X, or the second one,
>> etc.)
>>>>
>>>>> - /* last instance reserved for unregistered non-EAL threads */
>>>>> - if (unlikely(idx == LCORE_ID_ANY))
>>>>> - idx = RTE_MAX_LCORE;
>>>>> + seed += rte_thread_self().opaque_id;
>>>>> + __rte_srand_lfsr258(seed, rand_state);
>>>>> + }
>>>>> - return &rand_states[idx];
>>>>> + return rand_state;
>>>>> }
>>>>> uint64_t
>>>>> @@ -227,7 +227,9 @@ RTE_INIT(rte_rand_init)
>>>>> {
>>>>> uint64_t seed;
>>>>> - seed = __rte_random_initial_seed();
>>>>> + do
>>>>> + seed = __rte_random_initial_seed();
>>>>> + while (seed == 0);
>>>>
>>>> Might be worth a comment why seed 0 is not allowed. Alternatively,
>> use
>>>> some other way of signaling __rte_srand_lfsr258() must be called.
>>>>
>>>>> rte_srand(seed);
>>>>> }
>>>
More information about the dev
mailing list