[PATCH v19 0/2] net: optimize __rte_raw_cksum
David Marchand
david.marchand at redhat.com
Mon Feb 16 15:04:20 CET 2026
On Sat, 7 Feb 2026 at 02:29, Scott Mitchell <scott.k.mitch1 at gmail.com> wrote:
>
> Thanks for testing! I included my build/host config, results on the
> main branch, and then with this path applied below. What is your build
> flags/configuration (e, cpu_instruction_set, march, optimization
> level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to
> vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if
> your config enables vectorization.
>
> #### build / host config
> User defined options
> b_lto : false
> buildtype : release
> c_args : -fno-omit-frame-pointer
> -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1
> cpu_instruction_set: cascadelake
> default_library : static
> max_lcores : 128
> optimization : 3
> $ clang --version
> clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9)
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux release 9.4 (Plow)
>
> #### main branch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment Block size TSC cycles/block TSC cycles/byte
> Aligned 20 10.0 0.50
> Unaligned 20 10.1 0.50
> Aligned 21 11.1 0.53
> Unaligned 21 11.6 0.55
> Aligned 100 39.4 0.39
> Unaligned 100 67.3 0.67
> Aligned 101 43.3 0.43
> Unaligned 101 41.5 0.41
> Aligned 1500 728.2 0.49
> Unaligned 1500 805.8 0.54
> Aligned 1501 768.8 0.51
> Unaligned 1501 787.3 0.52
> Test OK
>
> #### with this patch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment Block size TSC cycles/block TSC cycles/byte
> Aligned 20 12.6 0.63
> Unaligned 20 12.3 0.62
> Aligned 21 13.6 0.65
> Unaligned 21 13.6 0.65
> Aligned 100 22.7 0.23
> Unaligned 100 22.6 0.23
> Aligned 101 47.4 0.47
> Unaligned 101 23.9 0.24
> Aligned 1500 73.9 0.05
> Unaligned 1500 73.9 0.05
> Aligned 1501 95.7 0.06
> Unaligned 1501 73.9 0.05
> Aligned 9000 459.8 0.05
> Unaligned 9000 523.5 0.06
> Aligned 9001 536.7 0.06
> Unaligned 9001 507.5 0.06
> Aligned 65536 3158.4 0.05
> Unaligned 65536 3506.1 0.05
> Aligned 65537 3277.6 0.05
> Unaligned 65537 3697.6 0.06
> Test OK
I redid my bench from scratch and I do see an improvement for clang.
-Aligned 1500 905.3 0.60
-Unaligned 1500 924.9 0.62
-Aligned 1501 907.6 0.60
-Unaligned 1501 932.1 0.62
-Aligned 9000 5252.1 0.58
-Unaligned 9000 5433.0 0.60
-Aligned 9001 5260.9 0.58
-Unaligned 9001 5440.4 0.60
-Aligned 65536 38395.2 0.59
-Unaligned 65536 39639.5 0.60
-Aligned 65537 38030.3 0.58
-Unaligned 65537 39292.7 0.60
+Aligned 1500 104.0 0.07
+Unaligned 1500 106.5 0.07
+Aligned 1501 104.1 0.07
+Unaligned 1501 107.0 0.07
+Aligned 9000 596.7 0.07
+Unaligned 9000 655.1 0.07
+Aligned 9001 597.6 0.07
+Unaligned 9001 657.2 0.07
+Aligned 65536 4139.3 0.06
+Unaligned 65536 4583.2 0.07
+Aligned 65537 4139.9 0.06
+Unaligned 65537 4585.9 0.07
Something was most likely wrong in my test (and seeing how the gcc and
clang numbers looked so close... I may have been using the gcc
binary...).
This is noticeable with clang, and no special cpu_instruction_set or
any kind of compiler optimisation level set.
I'll finish my checks and merge this nice improvement for rc1.
--
David Marchand
More information about the dev
mailing list