[PATCH v3 3/7] eal: add lcore variable performance test
Morten Brørup
mb at smartsharesystems.com
Thu Sep 12 11:39:40 CEST 2024
> +struct lcore_state {
> + uint64_t a;
> + uint64_t b;
> + uint64_t sum;
> +};
> +
> +static __rte_always_inline void
> +update(struct lcore_state *state)
> +{
> + state->sum += state->a * state->b;
> +}
> +
> +static RTE_DEFINE_PER_LCORE(struct lcore_state, tls_lcore_state);
> +
> +static __rte_noinline void
> +tls_update(void)
> +{
> + update(&RTE_PER_LCORE(tls_lcore_state));
I would normally access TLS variables directly, not through a pointer, i.e.:
RTE_PER_LCORE(tls_lcore_state.sum) += RTE_PER_LCORE(tls_lcore_state.a) * RTE_PER_LCORE(tls_lcore_state.b);
On the other hand, then it wouldn't be 1:1 comparable to the two other test cases.
Besides, I expect the compiler to optimize away the indirect access, and produce the same output (as for the alternative implementation) anyway.
No change requested. Just noticing.
> +}
> +
> +struct __rte_cache_aligned lcore_state_aligned {
> + uint64_t a;
> + uint64_t b;
> + uint64_t sum;
Please add RTE_CACHE_GUARD here, for 100 % matching the common design pattern.
> +};
> +
> +static struct lcore_state_aligned sarray_lcore_state[RTE_MAX_LCORE];
> + printf("Latencies [ns/update]\n");
> + printf("Thread-local storage Static array Lcore variables\n");
> + printf("%20.1f %13.1f %16.1f\n", tls_latency * 1e9,
> + sarray_latency * 1e9, lvar_latency * 1e9);
I prefer cycles over ns. Perhaps you could show both?
With RTE_CACHE_GUARD added where mentioned,
Acked-by: Morten Brørup <mb at smartsharesystems.com>
More information about the dev
mailing list