[PATCH v19 0/2] net: optimize __rte_raw_cksum
David Marchand
david.marchand at redhat.com
Fri Feb 6 15:54:10 CET 2026
Hi Scott,
On Mon, 2 Feb 2026 at 05:48, <scott.k.mitch1 at gmail.com> wrote:
>
> From: Scott <scott.k.mitch1 at gmail.com>
>
> This series optimizes __rte_raw_cksum by replacing memcpy with direct
> pointer access, enabling compiler vectorization on both GCC and Clang.
>
> Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs
> to prevent a GCC strict-aliasing bug where struct initialization is
> incorrectly elided, and avoid UB by clarifying access can be from any
> address.
>
> Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum
> to enable compiler optimizations while maintaining correctness across
> all architectures (including strict-alignment platforms).
>
> Performance results show significant improvements (40% for small buffers,
> up to 8x for larger buffers) on Intel Xeon with Clang 18.1.
>
> Changes in v19:
> - Move qualifiers before typedef on all platforms
> - test_hash_functions explicit 32 bit variable use
>
> Changes in v18:
> - Fix MSVC compile error __rte_aligned(1) must come before type
> - Fix test_hash_functions incorrect usage of unaligned_uint32_t
>
> Changes in v17:
> - Use __rte_aligned(1) unconditionally on unaligned type aliases
> - test_cksum_fuzz uses unit_test_suite_runner
> - test_cksum_fuzz reference method rename to
> test_cksum_fuzz_cksum_reference
>
> Changes in v16:
> - Add Fixes tag and Cc stable/author for backporting (patch 1)
>
> Changes in v15:
> - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST
>
> Changes in v14:
> - Split into two patches: EAL typedef fix and checksum optimization
> - Use unaligned_uint16_t directly instead of wrapper struct
> - Added __rte_may_alias to unaligned typedefs to prevent GCC bug
>
> Scott Mitchell (2):
> eal: add __rte_may_alias and __rte_aligned to unaligned typedefs
> net: __rte_raw_cksum pointers enable compiler optimizations
>
> app/test/meson.build | 1 +
> app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++
> app/test/test_cksum_perf.c | 2 +-
> app/test/test_hash_functions.c | 6 +-
> lib/eal/include/rte_common.h | 49 ++++---
> lib/net/rte_cksum.h | 14 +-
> 6 files changed, 279 insertions(+), 27 deletions(-)
> create mode 100644 app/test/test_cksum_fuzz.c
I have been trying to reproduce the numbers with one (venerable)
Skylake processor but I see no difference before/after the series.
Numbers are in the same range with gcc (11) and clang (20) on this
RHEL 9 system.
RTE>>cksum_perf_autotest
### rte_raw_cksum() performance ###
Alignment Block size TSC cycles/block TSC cycles/byte
Aligned 20 13.0 0.65
Unaligned 20 13.0 0.65
Aligned 21 14.0 0.67
Unaligned 21 14.0 0.67
Aligned 100 19.1 0.19
Unaligned 100 19.4 0.19
Aligned 101 20.1 0.20
Unaligned 101 22.1 0.22
Aligned 1500 132.5 0.09
Unaligned 1500 134.9 0.09
Aligned 1501 133.1 0.09
Unaligned 1501 146.3 0.10
Aligned 9000 766.7 0.09
Unaligned 9000 802.2 0.09
Aligned 9001 767.6 0.09
Unaligned 9001 800.3 0.09
Aligned 65536 5404.8 0.08
Unaligned 65536 5596.3 0.09
Aligned 65537 5406.8 0.08
Unaligned 65537 5604.5 0.09
Is the improvement only affecting clang18?
Other things I should check?
--
David Marchand
More information about the dev
mailing list