[PATCH v2 2/2] net: have checksum routines accept unaligned data

Olivier Matz olivier.matz at 6wind.com
Mon Jul 11 11:53:06 CEST 2022


Hi,

On Fri, Jul 08, 2022 at 02:56:08PM +0200, Mattias Rönnblom wrote:
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
> 
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
> 
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
> 
> Performance impact seems to range from none to a very slight
> regression.
> 
> Bugzilla ID: 1035
> Cc: stable at dpdk.org
> 
> ---

Using memcpy() looks to be a good solution fix the issue, while avoiding a
branch and the __may_alias__.

I just have one minor comment below.

> 
> v2:
>   * Simplified the odd-length conditional (Morten Brørup).
> 
> Reviewed-by: Morten Brørup <mb at smartsharesystems.com>
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> ---
>  lib/net/rte_ip.h | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index b502481670..a0334d931e 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -160,18 +160,21 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
>  static inline uint32_t
>  __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>  {
> -	/* extend strict-aliasing rules */
> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> -	const u16_p *u16_buf = (const u16_p *)buf;
> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> +	const void *end;
>  
> -	for (; u16_buf != end; ++u16_buf)
> -		sum += *u16_buf;
> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t));

What do you think about this form:

	for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t)));

This also has the good property to solve the debate about the
spaces around the '/' :)


> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> +		uint16_t v;
> +
> +		memcpy(&v, buf, sizeof(uint16_t));
> +		sum += v;
> +	}
>  
>  	/* if length is odd, keeping it byte order independent */
>  	if (unlikely(len % 2)) {
>  		uint16_t left = 0;
> -		*(unsigned char *)&left = *(const unsigned char *)end;
> +
> +		memcpy(&left, end, 1);
>  		sum += left;
>  	}
>  
> -- 
> 2.25.1
> 


More information about the dev mailing list