[PATCH v4 3/7] eal: add lcore variable performance test
Morten Brørup
mb at smartsharesystems.com
Mon Sep 16 13:54:51 CEST 2024
> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
> Sent: Monday, 16 September 2024 13.13
>
> On 2024-09-16 12:52, Mattias Rönnblom wrote:
> > Add basic micro benchmark for lcore variables, in an attempt to assure
> > that the overhead isn't significantly greater than alternative
> > approaches, in scenarios where the benefits aren't expected to show up
> > (i.e., when plenty of cache is available compared to the working set
> > size of the per-lcore data).
> >
>
> Here are some test results for a Raptor Cove @ 3,2 GHz (GCC 11):
>
> + ------------------------------------------------------- +
> + Test Suite : lcore variable perf autotest
> + ------------------------------------------------------- +
> Latencies [TSC cycles/update]
> Modules/Variables Static array Thread-local Storage Lcore variables
> 1 3.9 5.5 3.7
> 2 3.8 5.5 3.8
> 4 4.9 5.5 3.7
> 8 3.8 5.5 3.8
> 16 11.3 5.5 3.7
> 32 20.9 5.5 3.7
> 64 23.5 5.5 3.7
> 128 23.2 5.5 3.7
> 256 23.5 5.5 3.7
> 512 24.1 5.5 3.7
> 1024 25.3 5.5 3.9
> + TestCase [ 0] : test_lcore_var_access succeeded
> + ------------------------------------------------------- +
>
>
> The reason for TLS being slower than lcore variables (which in turn
> relies on TLS for lcore id lookup) is the lazy initialization
> conditional that is imposed on variant. Could that be avoided (which is
> module-dependent I suppose), it beats lcore variables at ~3.0 cycles/update.
I think you should not assume lazy initialization of TLS in your benchmark.
Our application uses TLS, and when spinning up a new thread, we call an per-lcore init function of each module before calling the per-lcore run function. This design pattern is also described in Figure 1.4 [1] in the Programmer's Guide.
[1]: https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html
>
> I must say I'm surprised to see lcore variables doing this good, at
> these very modest working set sizes. Probably, you can stay at near-zero
> L1 misses with lcore variables (and TLS), but start missing the L1 with
> static arrays.
More information about the dev
mailing list