[PATCH v4] eal/x86: optimize memcpy of small sizes
Morten Brørup
mb at smartsharesystems.com
Tue Nov 25 09:19:09 CET 2025
> Also, all uses of SSE2 _mm_loadu_si128() intrinsics were upgraded to
> SSE3 _mm_lddqu_si128().
> The Intel Intrinsics Guide notes that it may perform better when the
> data crosses a cache line boundary.
It turns out _mm_lddqu_si128() is much slower than _mm_loadu_si128().
Would have been nice if the Intel Intrinsics Guide mentioned that.
Marked v4 patch as Not Applicable, and changed v3 patch back to New.
More information about the dev
mailing list