[PATCH v19 0/2] net: optimize __rte_raw_cksum

David Marchand david.marchand at redhat.com
Fri Feb 6 15:54:10 CET 2026


Hi Scott,

On Mon, 2 Feb 2026 at 05:48, <scott.k.mitch1 at gmail.com> wrote:
>
> From: Scott <scott.k.mitch1 at gmail.com>
>
> This series optimizes __rte_raw_cksum by replacing memcpy with direct
> pointer access, enabling compiler vectorization on both GCC and Clang.
>
> Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs
> to prevent a GCC strict-aliasing bug where struct initialization is
> incorrectly elided, and avoid UB by clarifying access can be from any
> address.
>
> Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum
> to enable compiler optimizations while maintaining correctness across
> all architectures (including strict-alignment platforms).
>
> Performance results show significant improvements (40% for small buffers,
> up to 8x for larger buffers) on Intel Xeon with Clang 18.1.
>
> Changes in v19:
> - Move qualifiers before typedef on all platforms
> - test_hash_functions explicit 32 bit variable use
>
> Changes in v18:
> - Fix MSVC compile error __rte_aligned(1) must come before type
> - Fix test_hash_functions incorrect usage of unaligned_uint32_t
>
> Changes in v17:
> - Use __rte_aligned(1) unconditionally on unaligned type aliases
> - test_cksum_fuzz uses unit_test_suite_runner
> - test_cksum_fuzz reference method rename to
> test_cksum_fuzz_cksum_reference
>
> Changes in v16:
> - Add Fixes tag and Cc stable/author for backporting (patch 1)
>
> Changes in v15:
> - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST
>
> Changes in v14:
> - Split into two patches: EAL typedef fix and checksum optimization
> - Use unaligned_uint16_t directly instead of wrapper struct
> - Added __rte_may_alias to unaligned typedefs to prevent GCC bug
>
> Scott Mitchell (2):
>   eal: add __rte_may_alias and __rte_aligned to unaligned typedefs
>   net: __rte_raw_cksum pointers enable compiler optimizations
>
>  app/test/meson.build           |   1 +
>  app/test/test_cksum_fuzz.c     | 234 +++++++++++++++++++++++++++++++++
>  app/test/test_cksum_perf.c     |   2 +-
>  app/test/test_hash_functions.c |   6 +-
>  lib/eal/include/rte_common.h   |  49 ++++---
>  lib/net/rte_cksum.h            |  14 +-
>  6 files changed, 279 insertions(+), 27 deletions(-)
>  create mode 100644 app/test/test_cksum_fuzz.c

I have been trying to reproduce the numbers with one (venerable)
Skylake processor but I see no difference before/after the series.
Numbers are in the same range with gcc (11) and clang (20) on this
RHEL 9 system.

RTE>>cksum_perf_autotest
### rte_raw_cksum() performance ###
Alignment  Block size    TSC cycles/block  TSC cycles/byte
Aligned           20                13.0             0.65
Unaligned         20                13.0             0.65
Aligned           21                14.0             0.67
Unaligned         21                14.0             0.67
Aligned          100                19.1             0.19
Unaligned        100                19.4             0.19
Aligned          101                20.1             0.20
Unaligned        101                22.1             0.22
Aligned         1500               132.5             0.09
Unaligned       1500               134.9             0.09
Aligned         1501               133.1             0.09
Unaligned       1501               146.3             0.10
Aligned         9000               766.7             0.09
Unaligned       9000               802.2             0.09
Aligned         9001               767.6             0.09
Unaligned       9001               800.3             0.09
Aligned        65536              5404.8             0.08
Unaligned      65536              5596.3             0.09
Aligned        65537              5406.8             0.08
Unaligned      65537              5604.5             0.09


Is the improvement only affecting clang18?
Other things I should check?


-- 
David Marchand



More information about the dev mailing list