[dpdk-dev] rte_ether_addr_copy() strange comment
Van Haaren, Harry
harry.van.haaren at intel.com
Fri Jun 26 14:41:29 CEST 2020
> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yigit at intel.com>
> Sent: Friday, June 26, 2020 1:08 PM
> To: Morten Brørup <mb at smartsharesystems.com>; dev at dpdk.org
> Cc: Olivier Matz <olivier.matz at 6wind.com>; Van Haaren, Harry
> <harry.van.haaren at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>
> Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment
>
> On 6/25/2020 4:45 PM, Morten Brørup wrote:
> > The function rte_ether_addr_copy() checks for __INTEL_COMPILER and has a
> comment about "a strange gcc warning". It says:
> >
> > static inline void rte_ether_addr_copy(const struct rte_ether_addr *ea_from,
> > struct rte_ether_addr *ea_to)
> > {
> > #ifdef __INTEL_COMPILER
> > uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes);
> > uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes);
> >
> > to_words[0] = from_words[0];
> > to_words[1] = from_words[1];
> > to_words[2] = from_words[2];
> > #else
> > /*
> > * Use the common way, because of a strange gcc warning.
> > */
> > *ea_to = *ea_from;
> > #endif
> > }
> >
> > I can see that from_words discards the const qualifier. Is that the "strange" gcc
> warning the comment is referring to?
> >
> > This goes back to before the first public release of DPDK in 2013, ref.
> https://git.dpdk.org/dpdk/log/lib/librte_ether/rte_ether.h
> >
> >
> > It should be fixed as follows:
> >
> > - uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes);
> > - uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes);
> > + const uint16_t *from_words = (const uint16_t *)ea_from;
> > + uint16_t *to_words = (uint16_t *)ea_to;
> >
> > And the consequential question: Is copying the three shorts faster than
> copying the struct? In other words: Should we get rid of the #ifdef and use the
> first method only?
>
>
> I tried to investigate this in godbolt: https://godbolt.org/z/YSmvDn
There was a small hiccup in the struct mac definition, it is aligned to 2, not 16 as the above Godbolt link...
With the aligned attribute changed to 2 (as per DPDK header https://git.dpdk.org/dpdk/tree/lib/librte_net/rte_ether.h#n59 )
we get the required (but less performant) smaller stores:
WORD_COPY = 0, Aligned = 16
NOTE: Incorrect alignment provided, and invalid ASM as it stores over the 10 bytes after eth addr.
This code is from a GodBolt test only, and this bug is NOT present in upstream DPDK.
movdqa (%rsi), %xmm0
movaps %xmm0, (%rdi)
ret
WORD_COPY = 0, Aligned = 2 (correct, as per dpdk header)
movl (%rsi), %eax
movl %eax, (%rdi)
movzwl 4(%rsi), %eax
movw %ax, 4(%rdi)
ret
Word Copy = 1 (aligned = 2)
movzwl (%rsi), %eax
movw %ax, (%rdi)
movzwl 2(%rsi), %eax
movw %ax, 2(%rdi)
movzwl 4(%rsi), %eax
movw %ax, 4(%rdi)
ret
<snip previous output>
Ferruh said:
> Related to the struct copy vs word copy, struct copy seems with less instruction [1],[2],
> my vote to remove ifdef and keep struct copy.
+1 for struct copy here too, it gives the compiler space to optimize within its alignment bounds.
It does the best that it can (given a 6 byte area to store into), with 1x 4byte store, 1x 2byte store.
Regards, -Harry
PS: For extra bonus points, here's a SIMD version that only uses one store
https://godbolt.org/z/VAR2La. Unless you intend on copying billions of
L1 resident eth addrs, this may or may not be a useful optimization.
Note that it requires the 10 bytes after the ether addr to be valid to read.
It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and
writes the result back to DST.
movdqu (%rsi), %xmm0
movdqu (%rdi), %xmm1
pblendw $7, %xmm1, %xmm0
movups %xmm0, (%rdi)
ret
Actually, its possible to do this using a uint64_t (8 byte scalar) load/store too,
with some masking and bitwise OR... left as an exercise to the reader? :)
More information about the dev
mailing list