[PATCH v6 7/7] vhost: optimize memcpy routines when cc memcpy is used
Mattias Rönnblom
hofors at lysator.liu.se
Thu Oct 10 12:35:20 CEST 2024
On 2024-10-09 23:57, Stephen Hemminger wrote:
> On Fri, 20 Sep 2024 12:27:16 +0200
> Mattias Rönnblom <mattias.ronnblom at ericsson.com> wrote:
>
>> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
>> +static __rte_always_inline void
>> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
>> +{
>> + void *dst = __builtin_assume_aligned(in_dst, 16);
>> + const void *src = __builtin_assume_aligned(in_src, 16);
>
> Not sure if buffer is really aligned that way but x86 doesn't care.
>
I think it might care, actually. That's why this makes a difference.
With 16-byte alignment assumed, the compiler may use MOVDQA, otherwise,
it can't and must use MOVDQU. Generally these things doesn't matter from
a performance point of view in my experience, but it this case it did
(in my benchmark, on my CPU, with my compiler etc).
> Since src and dst can be pointers into mbuf at an offset.
> The offset will be a multiple of the buffer len.
More information about the dev
mailing list