[RFC] random: use per lcore state

Mattias Rönnblom hofors at lysator.liu.se
Fri Sep 8 09:04:29 CEST 2023


On 2023-09-07 01:00, Stephen Hemminger wrote:
> On Wed, 6 Sep 2023 22:02:54 +0200
> Mattias Rönnblom <hofors at lysator.liu.se> wrote:
> 
>> On 2023-09-06 19:20, Stephen Hemminger wrote:
>>> Move the random number state into thread local storage.
>>
>> Me and Morten discussed TLS versus other alternatives in some other
>> thread. The downside of TLS that Morten pointed out, from what I recall,
>> is that lazy initialization is *required* (since the number of threads
>> is open-ended), and the data ends up in non-huge page memory. It was
>> also unclear to me what the memory footprint implications would be,
>> would large per-lcore data structures be put in TLS. More specifically,
>> if they would be duplicated across all threads, even non-lcore threads.
> 
> But current method is unsafe on non-lcore threads.
> Two non-lcore threads calling rte_rand() will clash on state without
> any locking protection.
> 

Sure, just like the API docs say, although the documentation use more 
precise terminology.

If you want to extend the API MT safety guarantees, it should come with 
an argument to why this change is needed.

Is this to save the application from calling rte_thread_register() in 
control plane threads? For convenience? Or for being generally less 
error prone?

Another reason might be that the application have many threads (more 
than RTE_LCORE_MAX), so it will run out of lcore ids.

> Also, right now the array is sized at 129 entries to allow for the
> maximum number of lcores. When the maximum is increased to 512 or 1024
> the problem will get worse.

Using TLS will penalize every thread in the process, not only EAL 
threads and registered non-EAL threads, and worse: not only threads that 
are using the API in question.

Every thread will carry the TLS memory around, increasing the process 
memory footprint.

Thread creation will be slower, since TLS memory is allocated *and 
initialized*, lazy user code-level initialization or not.

On my particular Linux x86_64 system, pthread creation overhead looks 
something like:

8 us w/o any user code-level use of TLS
11 us w/ 16 kB of TLS
314 us w/ 2 MB of TLS.

So, whatever you put into TLS, it needs to be small.

Putting a large amount of data into TLS will effectively prevent the 
DPDK libraries from being linked into a heavily multi-threaded app, 
regardless if those threads calls into DPDK or not.

Again, this doesn't much affect rte_random.c, but does disqualify TLS as 
a plug-in replacement for the current pattern with a statically 
allocated lcore id-indexed array.


More information about the dev mailing list