[PATCH v11] eal/x86: optimize memcpy of small sizes

Morten Brørup mb at smartsharesystems.com
Mon Jun 1 16:19:03 CEST 2026


> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> Sent: Monday, 1 June 2026 15.38
> 
> 22/05/2026 00:42, Stephen Hemminger:
> > On Thu, 21 May 2026 18:56:31 +0000
> > Morten Brørup <mb at smartsharesystems.com> wrote:
> >
> > > The implementation for copying up to 64 bytes does not depend on
> address
> > > alignment with the size of the CPU's vector registers. Nonetheless,
> the
> > > exact same code for copying up to 64 bytes was present in both the
> aligned
> > > copy function and all the CPU vector register size specific
> variants of
> > > the unaligned copy functions.
> > > With this patch, the implementation for copying up to 64 bytes was
> > > consolidated into one instance, located in the common copy
> function,
> > > before checking alignment requirements.
> > > This provides three benefits:
> > > 1. No copy-paste in the source code.
> > > 2. A performance gain for copying up to 64 bytes, because the
> > > address alignment check is avoided in this case.
> > > 3. Reduced instruction memory footprint, because the compiler only
> > > generates one instance of the function for copying up to 64 bytes,
> instead
> > > of two instances (one in the unaligned copy function, and one in
> the
> > > aligned copy function).
> > >
> > > Furthermore, __rte_restrict was added to source and destination
> addresses.
> > >
> > > Also, the missing implementation of rte_mov48() was added.
> > >
> > > Until recently, some drivers required disabling stringop-overflow
> warnings
> > > when using rte_memcpy().
> > > For some strange reason, these warnings were disabled in the
> rte_memcpy
> > > header file, instead of in the problematic drivers.
> > > With series-38174 ("remove use of rte_memcpy from net/intel"), the
> > > problematic drivers were updated to use memcpy() instead of
> rte_memcpy(),
> > > so disabling these warnings is no longer required, and was removed.
> > >
> > > Regarding performance...
> > > The memcpy performance test (cache-to-cache copy) shows:
> > > Copying up to 15 bytes takes ca. 4.5 cycles, versus ca. 6.5 cycles
> before.
> > > Copying 8 bytes takes 4 cycles, versus 7 cycles before.
> > > Copying 16 bytes takes 2 cycles, versus 4 cycles before.
> > > Copying 64 bytes takes 4 cycles, versus 7 cycles before.
> > >
> > > Depends-on: series-38174 ("remove use of rte_memcpy from
> net/intel")
> > >
> > > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > > Acked-by: Bruce Richardson <bruce.richardson at intel.com>
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev at huawei.com>
> >
> > Here is the full wordy all providers reviews.
> [...]
> > Summary across 4 provider(s): clean=0 warnings=1 errors=3 failed=0
> 
> What is the followup?

AI wants me to fix existing code.
I had chosen to stick to the file's existing coding style etc., including some unnecessary type casts.
So AI also complains about my code doing things the same way existing code in the file does it.
Fixing existing code is out of scope for this patch. And using a different style for my changes would be confusing.

The patch description mentions that stringop-overflow warnings are no longer disabled for rte_mempcy().
AI wants this to go into the release notes (although it is x86 architecture only).
But IMO, this is far below the threshold for what should go into the release notes.

> Do we target DPDK 26.07?

IMO, yes, this v11 patch is good.



More information about the dev mailing list