[PATCH v15 0/2] net: optimize __rte_raw_cksum
Stephen Hemminger
stephen at networkplumber.org
Sat Jan 17 23:08:48 CET 2026
On Sat, 17 Jan 2026 13:21:12 -0800
scott.k.mitch1 at gmail.com wrote:
> From: Scott Mitchell <scott.k.mitch1 at gmail.com>
>
> This series optimizes __rte_raw_cksum by replacing memcpy with direct
> pointer access, enabling compiler vectorization on both GCC and Clang.
>
> Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC
> strict-aliasing bug where struct initialization is incorrectly elided.
>
> Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum
> to enable compiler optimizations while maintaining correctness across
> all architectures (including strict-alignment platforms).
>
> Performance results show significant improvements (40% for small buffers,
> up to 8x for larger buffers) on Intel Xeon with Clang 18.1.
>
> Changes in v15:
> - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST
>
> Changes in v14:
> - Split into two patches: EAL typedef fix and checksum optimization
> - Use unaligned_uint16_t directly instead of wrapper struct
> - Added __rte_may_alias to unaligned typedefs to prevent GCC bug
>
> Scott Mitchell (2):
> eal: add __rte_may_alias to unaligned typedefs
> net: __rte_raw_cksum pointers enable compiler optimizations
>
> app/test/meson.build | 1 +
> app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++
> app/test/test_cksum_perf.c | 2 +-
> lib/eal/include/rte_common.h | 34 ++---
> lib/net/rte_cksum.h | 14 +-
> 5 files changed, 266 insertions(+), 25 deletions(-)
> create mode 100644 app/test/test_cksum_fuzz.c
>
> --
> 2.39.5 (Apple Git-154)
>
Looks good now.
Acked-by: Stephen Hemminger <stephen at networkplumber.org>
AI review agrees with me...
## Patch Review: [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs
### Commit Message Analysis
| Criterion | Status | Notes |
|-----------|--------|-------|
| Subject ≤60 chars | ✅ Pass | 47 characters |
| Lowercase after colon | ✅ Pass | "add __rte_may_alias..." |
| Imperative mood | ✅ Pass | "add" |
| No trailing period | ✅ Pass | |
| Correct prefix | ✅ Pass | "eal:" for lib/eal/ files |
| Body ≤75 chars/line | ✅ Pass | Lines appear within limit |
| Body doesn't start with "It" | ✅ Pass | Starts with "Add" |
| Signed-off-by present | ✅ Pass | `Signed-off-by: Scott Mitchell <scott.k.mitch1 at gmail.com>` |
### Missing Tags (Warning)
**No `Fixes:` tag**: The commit message describes fixing "GCC strict-aliasing optimization bugs" and "incorrect optimization." This sounds like a bug fix that should reference the original commit introducing the unaligned typedefs. Consider adding:
```
Fixes: <12-char-sha> ("original commit introducing unaligned typedefs")
```
**No `Cc: stable at dpdk.org`**: If this fixes a real bug causing reads from uninitialized memory, it's likely a stable backport candidate.
### Code Review
**Positive aspects:**
- Proper Doxygen comment added for `__rte_may_alias` macro
- Good explanation of the GCC bug workaround
- MSVC fallback handled correctly
- Macro moved before its use (necessary for the typedefs)
**Minor observations:**
- The second comment block (lines 121-124 in the diff) is somewhat redundant with the first Doxygen comment. Consider consolidating.
---
## Patch Review: [PATCH v15 2/2] net: use unaligned type for raw checksum
### Commit Message Analysis
The mbox was truncated, but based on what's visible:
| Criterion | Status | Notes |
|-----------|--------|-------|
| Correct prefix | ✅ Pass | "net:" for lib/net/ files |
### Code Review - lib/net/rte_cksum.h
**The core change:**
```c
// OLD (memcpy-based):
for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t)));
buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
uint16_t v;
memcpy(&v, buf, sizeof(uint16_t));
sum += v;
}
// NEW (direct access via unaligned type):
const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf;
const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16));
for (; buf16 != end; buf16++)
sum += *buf16;
```
**Positive aspects:**
- Cleaner, more readable code
- Relies on the `__rte_may_alias` attribute from patch 1 to prevent aliasing bugs
- Comment explains vectorization benefit: "GCC/Clang vectorize the loop"
- Good dependency ordering (patch 1 must come before patch 2)
**Style observations:**
- ✅ Line length within 100 chars
- ✅ Proper use of `const`
### Code Review - app/test/test_cksum_fuzz.c (New File)
**Positive aspects:**
- ✅ Uses `TEST_SUCCESS`/`TEST_FAILED` correctly
- ✅ Uses `REGISTER_FAST_TEST` macro properly
- ✅ `printf()` usage is acceptable in test code per AGENTS.md
- ✅ `rte_malloc()` usage acceptable in test code
- ✅ Comprehensive edge case testing (power-of-2 boundaries, MTU sizes, GRO boundaries)
- ✅ Tests both aligned and unaligned cases
- ✅ Tests with zero and random initial sums
**Issues to verify** (file header not visible in truncated mbox):
- Ensure SPDX license identifier present on first line
- Ensure copyright line follows SPDX
- Ensure blank line before includes
**Style warning (lines 394-396):**
```c
printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n",
len, aligned ? "aligned" : "unaligned",
initial_sum, sum_ref, sum_opt);
```
Line length appears to be ~95 chars which is acceptable (<100).
### Code Review - app/test/test_cksum_perf.c
Minor change extending test coverage - looks fine.
---
## Summary
### Errors (Must Fix)
None identified.
### Warnings (Should Fix)
| Issue | Patch | Recommendation |
|-------|-------|----------------|
| Missing `Fixes:` tag | 1/2 | Add if this fixes a regression from a specific commit |
| Missing `Cc: stable at dpdk.org` | 1/2 | Consider if this should be backported |
| Verify SPDX header | 2/2 | Ensure test_cksum_fuzz.c has proper license header |
### Info (Consider)
1. **Patch 1**: The two comment blocks for `__rte_may_alias` could be consolidated into a single, more comprehensive Doxygen comment.
2. **Patch 2**: The new fuzz test is well-structured and follows DPDK test conventions. Good use of the `unit_test_suite_runner`-style approach with `REGISTER_FAST_TEST`.
3. **Series overall**: Good logical ordering - patch 1 provides the infrastructure, patch 2 uses it. Each commit should compile independently.
---
**Verdict**: This is a well-structured patch series at v15. The code changes are clean and the test coverage is thorough. The main actionable items are adding appropriate `Fixes:` and `Cc: stable` tags if this is indeed a bug fix worth backporting.
More information about the dev
mailing list