[PATCH v9] net: optimize raw checksum computation
Scott Mitchell
scott.k.mitch1 at gmail.com
Fri Jan 9 18:50:46 CET 2026
Manual unroll makes sense! Are you OK if we land minimal changes for
__rte_raw_cksum optimization and consider manual unrolling of
ipv4/ipv6 headers as a follow up? Morten requested I break the
patch up and minimize changes (I'm working on this now). If these
were the only cases causing pain for my patch it makes more sense
to do the unroll first, but there are other cases to consider:
- mlx5_flow_dv.c usage of __rte_raw_cksum could arguably
be unrolled too, but consider a trade-off of spreading around
manual unroll code. one option is for rte_cksum.h to have
specialized unrolled length functions to keep the code consolidated
(but then additional API surface).
- hinic_pmd_tx.c - should call rte_ipv6_phdr_cksum and
rte_ipv4_phdr_cksum instead of duplicating logic.
On Fri, Jan 9, 2026 at 12:39 AM Stephen Hemminger
<stephen at networkplumber.org> wrote:
>
> On Thu, 8 Jan 2026 16:47:13 -0500
> scott.k.mitch1 at gmail.com wrote:
>
> > diff --git a/lib/net/rte_ip6.h b/lib/net/rte_ip6.h
> > index d1abf1f5d5..8a7e5e4b8a 100644
> > --- a/lib/net/rte_ip6.h
> > +++ b/lib/net/rte_ip6.h
> > @@ -560,19 +560,18 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
> > static inline uint16_t
> > rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
> > {
> > - uint32_t sum;
> > struct {
> > rte_be32_t len; /* L4 length. */
> > rte_be32_t proto; /* L4 protocol - top 3 bytes must be zero */
> > - } psd_hdr;
> > -
> > - psd_hdr.proto = (uint32_t)(ipv6_hdr->proto << 24);
> > - if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> > - psd_hdr.len = 0;
> > - else
> > - psd_hdr.len = ipv6_hdr->payload_len;
> > + } psd_hdr = {
> > + .len = (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> > + ? (rte_be32_t)0
> > + : ipv6_hdr->payload_len,
> > + .proto = (uint32_t)(ipv6_hdr->proto << 24)
> > + };
> > + RTE_SUPPRESS_UNINITIALIZED_WARNING(psd_hdr);
> >
> > - sum = __rte_raw_cksum(&ipv6_hdr->src_addr,
> > + uint32_t sum = __rte_raw_cksum(&ipv6_hdr->src_addr,
> > sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr),
> > 0);
> > sum = __rte_raw_cksum(&psd_hdr, sizeof(psd_hdr), sum);
> > --
>
> Seems like this could be unrolled as well.
>
> static inline uint16_t
> rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
> {
> union {
> struct {
> struct rte_ipv6_addr src_addr; /* 16 bytes */
> struct rte_ipv6_addr dst_addr; /* 16 bytes */
> rte_be32_t len; /* 4 bytes */
> rte_be32_t proto; /* 4 bytes */
> } psd;
> uint16_t u16[20];
> } hdr = {
> .psd = {
> .src_addr = ipv6_hdr->src_addr,
> .dst_addr = ipv6_hdr->dst_addr,
> .proto = (uint32_t)(ipv6_hdr->proto << 24),
> }
> };
> uint32_t sum;
>
> if (!(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)))
> hdr.psd.len = ipv6_hdr->payload_len;
>
> /* Unrolled sum of 20 uint16_t words:
> * [0-7]: src_addr
> * [8-15]: dst_addr
> * [16-17]: len
> * [18-19]: proto (3 zero bytes + next header)
> */
> sum = hdr.u16[0] + hdr.u16[1] + hdr.u16[2] + hdr.u16[3] +
> hdr.u16[4] + hdr.u16[5] + hdr.u16[6] + hdr.u16[7] +
> hdr.u16[8] + hdr.u16[9] + hdr.u16[10] + hdr.u16[11] +
> hdr.u16[12] + hdr.u16[13] + hdr.u16[14] + hdr.u16[15] +
> hdr.u16[16] + hdr.u16[17] + hdr.u16[18] + hdr.u16[19];
>
> sum = (sum & 0xffff) + (sum >> 16);
> sum = (sum & 0xffff) + (sum >> 16);
> return (uint16_t)sum;
> }
More information about the dev
mailing list