[PATCH v11] net: optimize raw checksum computation

Scott Mitchell scott.k.mitch1 at gmail.com
Fri Jan 9 18:23:19 CET 2026


On Fri, Jan 9, 2026 at 10:58 AM Morten Brørup <mb at smartsharesystems.com> wrote:
>
> > From: Scott Mitchell [mailto:scott.k.mitch1 at gmail.com]
> > Sent: Friday, 9 January 2026 16.27
> >
> > On Fri, Jan 9, 2026 at 4:26 AM Morten Brørup <mb at smartsharesystems.com>
> > wrote:
> > >
> > > > Changes in v8:
> > > > - __rte_raw_cksum: use native pointer arithmetic instead of
> > RTE_PTR_ADD
> > > >   to avoid incorrect results with -O3 for UDP checksums. Also
> > improves
> > > >   performance due to less assembly generated with Clang.
> > >
> > > Personally, I also have observed GCC's optimizer behave as if it
> > loses some contextual information when using RTE_PTR_ADD, and thus
> > emitting less optimal code.
> > > I didn't look further into it, and thus have no data or examples to
> > back up the claim. Which is why I haven't started a discussion about
> > discouraging the use of RTE_PTR_ADD.
> > > In other words: I support this change.
> >
> > Sounds good! I observed ~600 (dpdk ptr macros) vs ~500 (native c ptr
> > operations) TSC cycles/block in cksum_perf_autotest.
>
> That is a significant performance degradation caused by the RTE_PTR_ADD() macros. We really should look into that - some day. ;-)
> Our application code base has RTE_CONST_PTR_ADD/SUB() for type consistency reasons (not for performance reasons). But I haven't gotten around to submitting them to the DPDK project yet.
> I wonder if the implicit stripping of "const" when using the RTE_PTR_ADD() macros makes the difference, or if the difference stems from other optimizer context getting lost.
>
> >
> > >
> > > >       /* if length is odd, keeping it byte order independent */
> > > > -     if (unlikely(len % 2)) {
> > > > +     if (len & 1) {
> > > >               uint16_t left = 0;
> > > > -
> > > >               memcpy(&left, end, 1);
> > > >               sum += left;
> > > >       }
> > >
> > > Changing "len % 2" to "len & 1" made sense for consistency in
> > previous versions handling 32/16/8/4/2-byte chunks before this 1-byte
> > chunk; now it makes no difference, so consider not changing this part
> > at all.
> > > Under all circumstances, don't remove the unlikely() for handling odd
> > length in __rte_raw_cksum(). The vast majority of packets (and partial
> > packets, e.g. headers) being checksummed are even length.
> > >
> >
> > Sounds good. I will restore the original.
> >
> > The use case that motivated these changes was software interfaces
> > (veth)
> > with encapsulation requiring software checksum on inner IPv4 payloads,
> > where lengths may be odd/even.
>
> You might want to mention the use case in the cover letter or patch description.
> Having a real use case often helps getting a patch accepted, especially for optimization patches.
>

Will do. Big win for this patch is enabling vectorization on clang
which doesn't require the changes to the last byte.

> > However, I agree that header checksums
> > with even lengths are the more common case and unlikely() is
> > appropriate.
>
> In my experience (and based on statistics from our appliances deployed), internet traffic is dominated by "max size" (1500 byte or QUIC "safe max" somewhat below 1500 byte) packets and empty TCP ACK packets, which are even size.
> So I also consider the unlikely()applicable to "real life" traffic on the internet.
> Although it's not in the order of magnitude many people advocate should be a requirement for unlikely().
>

1.5k internet common-case makes sense. We have use cases using jumbo frames
where non-full-frames aren't uncommon.


More information about the dev mailing list