[PATCH v12 1/3] net: optimize __rte_raw_cksum and add tests

Morten Brørup mb at smartsharesystems.com
Sat Jan 10 15:47:04 CET 2026


> From: Scott <scott_mitchell at apple.com>
> 
> __rte_raw_cksum uses a loop with memcpy on each iteration.
> GCC 15+ is able to vectorize the loop but Clang 18.1 is not.
> Replacing the memcpy with unaligned_uint16_t pointer access enables
> both GCC and Clang to vectorize with SSE/AVX/AVX-512.
> 
> This patch adds comprehensive fuzz testing and updates the performance
> test to measure the optimization impact.
> 
> Performance results from cksum_perf_autotest on Intel Xeon
> (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte):
> 
>   Block size    Before    After    Improvement
>          100      0.40     0.24        ~40%
>         1500      0.50     0.06        ~8x
>         9000      0.49     0.06        ~8x
> 
> Signed-off-by: Scott Mitchell <scott.k.mitch1 at gmail.com>
> ---

Probably makes no practical difference, but consider marking the __rte_raw_cksum() function __rte_pure:
https://elixir.bootlin.com/dpdk/v25.11/source/lib/eal/include/rte_common.h#L228

With or without __rte_pure marking,
Acked-by: Morten Brørup <mb at smartsharesystems.com>



More information about the dev mailing list