[PATCH v11] eal/x86: optimize memcpy of small sizes

Thomas Monjalon thomas at monjalon.net
Mon Jun 1 21:48:59 CEST 2026


21/05/2026 20:56, Morten Brørup:
> The implementation for copying up to 64 bytes does not depend on address
> alignment with the size of the CPU's vector registers. Nonetheless, the
> exact same code for copying up to 64 bytes was present in both the aligned
> copy function and all the CPU vector register size specific variants of
> the unaligned copy functions.
> With this patch, the implementation for copying up to 64 bytes was
> consolidated into one instance, located in the common copy function,
> before checking alignment requirements.
> This provides three benefits:
> 1. No copy-paste in the source code.
> 2. A performance gain for copying up to 64 bytes, because the
> address alignment check is avoided in this case.
> 3. Reduced instruction memory footprint, because the compiler only
> generates one instance of the function for copying up to 64 bytes, instead
> of two instances (one in the unaligned copy function, and one in the
> aligned copy function).
> 
> Furthermore, __rte_restrict was added to source and destination addresses.
> 
> Also, the missing implementation of rte_mov48() was added.
> 
> Until recently, some drivers required disabling stringop-overflow warnings
> when using rte_memcpy().
> For some strange reason, these warnings were disabled in the rte_memcpy
> header file, instead of in the problematic drivers.
> With series-38174 ("remove use of rte_memcpy from net/intel"), the
> problematic drivers were updated to use memcpy() instead of rte_memcpy(),
> so disabling these warnings is no longer required, and was removed.
> 
> Regarding performance...
> The memcpy performance test (cache-to-cache copy) shows:
> Copying up to 15 bytes takes ca. 4.5 cycles, versus ca. 6.5 cycles before.
> Copying 8 bytes takes 4 cycles, versus 7 cycles before.
> Copying 16 bytes takes 2 cycles, versus 4 cycles before.
> Copying 64 bytes takes 4 cycles, versus 7 cycles before.
> 
> Depends-on: series-38174 ("remove use of rte_memcpy from net/intel")
> 
> Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> Acked-by: Bruce Richardson <bruce.richardson at intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev at huawei.com>

Applied, thanks.





More information about the dev mailing list