[dpdk-dev] rte_ether_addr_copy() strange comment
Morten Brørup
mb at smartsharesystems.com
Fri Jun 26 14:34:56 CEST 2020
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Friday, June 26, 2020 2:08 PM
>
> On 6/25/2020 4:45 PM, Morten Brørup wrote:
> > The function rte_ether_addr_copy() checks for __INTEL_COMPILER and
> has a comment about "a strange gcc warning". It says:
> >
> > static inline void rte_ether_addr_copy(const struct rte_ether_addr
> *ea_from,
> > struct rte_ether_addr *ea_to)
> > {
> > #ifdef __INTEL_COMPILER
> > uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes);
> > uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes);
> >
> > to_words[0] = from_words[0];
> > to_words[1] = from_words[1];
> > to_words[2] = from_words[2];
> > #else
> > /*
> > * Use the common way, because of a strange gcc warning.
> > */
> > *ea_to = *ea_from;
> > #endif
> > }
> >
> > I can see that from_words discards the const qualifier. Is that the
> "strange" gcc warning the comment is referring to?
> >
> > This goes back to before the first public release of DPDK in 2013,
> ref. https://git.dpdk.org/dpdk/log/lib/librte_ether/rte_ether.h
> >
> >
> > It should be fixed as follows:
> >
> > - uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes);
> > - uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes);
> > + const uint16_t *from_words = (const uint16_t *)ea_from;
> > + uint16_t *to_words = (uint16_t *)ea_to;
> >
> > And the consequential question: Is copying the three shorts faster
> than copying the struct? In other words: Should we get rid of the
> #ifdef and use the first method only?
>
>
> I tried to investigate this in godbolt: https://godbolt.org/z/YSmvDn
>
> First I don't see the "strange" gcc warning with various gcc versions
> there.
>
> Related to the struct copy vs word copy, struct copy seems with less
> instruction
> [1],[2],
> my vote to remove ifdef and keep struct copy.
>
>
> [1] copy as individual function
> [1a] gcc 10.1, struct copy:
> copy:
> movdqa (%rsi), %xmm0
> movaps %xmm0, (%rdi)
> ret
>
> [1b] gcc 10.1, word copy:
> copy:
> movzwl (%rsi), %eax
> movw %ax, (%rdi)
> movzwl 2(%rsi), %eax
> movw %ax, 2(%rdi)
> movzwl 4(%rsi), %eax
> movw %ax, 4(%rdi)
> ret
>
> [1c] icc 19.0.1, struct copy
> copy:
> movups (%rsi), %xmm0 #19.13
> movups %xmm0, (%rdi) #19.13
> ret
>
>
> [2] gcc 10.1, copy as inline function that knows the data, both seems
> similar
> // .addr = {1, 1, 1, 1, 1, 1},
> [2a] struct copy:
> ...
> movl $257, %eax
> movw %ax, 4(%rsp)
> leaq 16(%rsp), %rdi
> movl $16843009, (%rsp)
> movdqa (%rsp), %xmm0
> movaps %xmm0, 16(%rsp)
> ...
>
> [2b] word copy:
> movl $257, %eax
> movq %rsp, %rdi
> movw %ax, 4(%rsp)
> movl $16843009, (%rsp)
>
Thank you for the detailed response, Ferruh.
I didn't know about godbolt, so thank you for that reference too.
The address struct is 2 byte aligned, not 16 byte aligned. Modifying your test in godbolt to use 2 byte alignment gives a similar result, i.e. fewer instructions on both icc and gcc.
[1c-modified] icc 19.0.1, struct copy
copy:
movl (%rsi), %eax #19.13
movl %eax, (%rdi) #19.13
movzwl 4(%rsi), %edx #19.13
movw %dx, 4(%rdi) #19.13
ret #28.1
[1d-modified] icc 19.0.1, word copy
copy:
movzwl (%rsi), %eax #24.12
movw %ax, (%rdi) #24.5
movzwl 2(%rsi), %edx #25.12
movw %dx, 2(%rdi) #25.5
movzwl 4(%rsi), %ecx #26.12
movw %cx, 4(%rdi) #26.5
ret #28.1
Testing for ARM64 on godbolt gives a similar result: more instructions using word copy than struct copy.
In conclusion, I will proceed with the struct copy.
More information about the dev
mailing list