[PATCH v4] eal/x86: optimize memcpy of small sizes

Morten Brørup mb at smartsharesystems.com
Tue Nov 25 09:19:09 CET 2025

Previous message (by thread): [PATCH v4] eal/x86: optimize memcpy of small sizes
Next message (by thread): [PATCH v4 0/8] tests: enable format truncation checks
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Also, all uses of SSE2 _mm_loadu_si128() intrinsics were upgraded to
> SSE3 _mm_lddqu_si128().
> The Intel Intrinsics Guide notes that it may perform better when the
> data crosses a cache line boundary.

It turns out _mm_lddqu_si128() is much slower than _mm_loadu_si128().
Would have been nice if the Intel Intrinsics Guide mentioned that.

Marked v4 patch as Not Applicable, and changed v3 patch back to New.

Previous message (by thread): [PATCH v4] eal/x86: optimize memcpy of small sizes
Next message (by thread): [PATCH v4 0/8] tests: enable format truncation checks
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list