[PATCH v12 1/3] net: optimize __rte_raw_cksum and add tests
Morten Brørup
mb at smartsharesystems.com
Sat Jan 10 15:47:04 CET 2026
> From: Scott <scott_mitchell at apple.com>
>
> __rte_raw_cksum uses a loop with memcpy on each iteration.
> GCC 15+ is able to vectorize the loop but Clang 18.1 is not.
> Replacing the memcpy with unaligned_uint16_t pointer access enables
> both GCC and Clang to vectorize with SSE/AVX/AVX-512.
>
> This patch adds comprehensive fuzz testing and updates the performance
> test to measure the optimization impact.
>
> Performance results from cksum_perf_autotest on Intel Xeon
> (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte):
>
> Block size Before After Improvement
> 100 0.40 0.24 ~40%
> 1500 0.50 0.06 ~8x
> 9000 0.49 0.06 ~8x
>
> Signed-off-by: Scott Mitchell <scott.k.mitch1 at gmail.com>
> ---
Probably makes no practical difference, but consider marking the __rte_raw_cksum() function __rte_pure:
https://elixir.bootlin.com/dpdk/v25.11/source/lib/eal/include/rte_common.h#L228
With or without __rte_pure marking,
Acked-by: Morten Brørup <mb at smartsharesystems.com>
More information about the dev
mailing list