[PATCH 2/2] net: have checksum routines accept unaligned data
Morten Brørup
mb at smartsharesystems.com
Thu Jul 7 23:44:45 CEST 2022
> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
> Sent: Thursday, 7 July 2022 20.35
>
> From: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
>
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
>
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
>
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
>
> Performance impact seems to range from none to a very slight
> regression.
>
> Bugzilla ID: 1035
> Cc: stable at dpdk.org
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> ---
> lib/net/rte_ip.h | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index b502481670..a9e6251f14 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> *ipv4_hdr)
> static inline uint32_t
> __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> {
> - /* extend strict-aliasing rules */
> - typedef uint16_t __attribute__((__may_alias__)) u16_p;
> - const u16_p *u16_buf = (const u16_p *)buf;
> - const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> + const void *end;
I would set "end" here instead, possibly making the pointer const too. And add spaces around '/'.
const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t));
>
> - for (; u16_buf != end; ++u16_buf)
> - sum += *u16_buf;
> + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
> sizeof(uint16_t));
> + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> + uint16_t v;
> +
> + memcpy(&v, buf, sizeof(uint16_t));
> + sum += v;
> + }
>
> /* if length is odd, keeping it byte order independent */
> if (unlikely(len % 2)) {
> + uint8_t last;
> uint16_t left = 0;
> - *(unsigned char *)&left = *(const unsigned char *)end;
> +
> + memcpy(&last, end, 1);
> + *(unsigned char *)&left = last;
Couldn't you just memcpy(&left, end, 1), and omit the temporary variable "last"?
> sum += left;
> }
>
> --
> 2.25.1
>
With our without my suggested changes, it looks good.
Reviewed-by: Morten Brørup <mb at smartsharesystems.com>
More information about the dev
mailing list