[PATCH v11] net: optimize raw checksum computation
Scott Mitchell
scott.k.mitch1 at gmail.com
Sat Jan 10 04:41:38 CET 2026
> Here are some more thoughts about loop unroll...
> In another mail [1], you are discussing manual loop unroll for rte_ipv4/ipv6_phdr_cksum().
> Perhaps the compiler already loop unrolls those.
> Check the assembler output for the existing code calling __rte_raw_cksum().
> If the compiler doesn't loop unroll __rte_raw_cksum() for those two functions, maybe you can help it by modifying __rte_raw_cksum(); try replacing the end pointer with an int counter, which will be compile time constant when called by rte_ipv4/ipv6_phdr_cksum().
>
> [1]: https://inbox.dpdk.org/dev/CAFn2buA5NzmzA0+t1_5auigvQTyT7Ne6RMVaPVU=sdC03nd2Lg@mail.gmail.com/
>
> PS: I do the following when optimizing inline functions: Add non-inline functions calling the inline functions, and then use "objdump -S" to look at the generated code. E.g.:
>
> uint32_t review__rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> { return __rte_raw_cksum(buf, len, sum); }
>
> uint32_t review__rte_raw_cksum_len20(const void *buf, uint32_t sum)
> { return __rte_raw_cksum(buf, 20, sum); }
>
> uint32_t review__rte_raw_cksum_len8(const void *buf, uint32_t sum)
> { return __rte_raw_cksum(buf, 8, sum); }
>
https://godbolt.org/z/qr39hf76s
rte_ipv4_phdr_cksum and rte_ipv6_phdr_cksum are both fully unrolled
(-O2 or higher). Vectorization also happens (clang chooses
not to vectorize ipv4). yay compilers :)
More information about the dev
mailing list