[PATCH v6 7/7] vhost: optimize memcpy routines when cc memcpy is used

Mattias Rönnblom hofors at lysator.liu.se
Thu Oct 10 12:35:20 CEST 2024


On 2024-10-09 23:57, Stephen Hemminger wrote:
> On Fri, 20 Sep 2024 12:27:16 +0200
> Mattias Rönnblom <mattias.ronnblom at ericsson.com> wrote:
> 
>> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
>> +static __rte_always_inline void
>> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
>> +{
>> +	void *dst = __builtin_assume_aligned(in_dst, 16);
>> +	const void *src = __builtin_assume_aligned(in_src, 16);
> 
> Not sure if buffer is really aligned that way but x86 doesn't care.
> 

I think it might care, actually. That's why this makes a difference. 
With 16-byte alignment assumed, the compiler may use MOVDQA, otherwise, 
it can't and must use MOVDQU. Generally these things doesn't matter from 
a performance point of view in my experience, but it this case it did 
(in my benchmark, on my CPU, with my compiler etc).

> Since src and dst can be pointers into mbuf at an offset.
> The offset will be a multiple of the buffer len.



More information about the dev mailing list