[PATCH] eal/x86: improve rte_memcpy const size 16 performance

Stephen Hemminger stephen at networkplumber.org
Sun Mar 3 06:58:07 CET 2024

Previous message (by thread): [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Next message (by thread): [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 2 Mar 2024 21:40:03 -0800
Stephen Hemminger <stephen at networkplumber.org> wrote:

> On Sun,  3 Mar 2024 00:48:12 +0100
> Morten Brørup <mb at smartsharesystems.com> wrote:
> 
> > When the rte_memcpy() size is 16, the same 16 bytes are copied twice.
> > In the case where the size is knownto be 16 at build tine, omit the
> > duplicate copy.
> > 
> > Reduced the amount of effectively copy-pasted code by using #ifdef
> > inside functions instead of outside functions.
> > 
> > Suggested-by: Stephen Hemminger <stephen at networkplumber.org>
> > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > ---  
> 
> Looks good, let me see how it looks in goldbolt vs Gcc.
> 
> One other issue is that for the non-constant case, rte_memcpy has an excessively
> large inline code footprint. That is one of the reasons Gcc doesn't always
> inline.  For > 128 bytes, it really should be a function.

For size of 4,6,8,16, 32, 64, up to 128 Gcc inline and rte_memcpy match.

For size 128. It looks gcc is simpler.

rte_copy_addr:
        vmovdqu ymm0, YMMWORD PTR [rsi]
        vextracti128    XMMWORD PTR [rdi+16], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+32]
        vextracti128    XMMWORD PTR [rdi+48], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+32], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+64]
        vextracti128    XMMWORD PTR [rdi+80], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+64], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+96]
        vextracti128    XMMWORD PTR [rdi+112], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+96], xmm0
        vzeroupper
        ret
copy_addr:
        vmovdqu ymm0, YMMWORD PTR [rsi]
        vmovdqu YMMWORD PTR [rdi], ymm0
        vmovdqu ymm1, YMMWORD PTR [rsi+32]
        vmovdqu YMMWORD PTR [rdi+32], ymm1
        vmovdqu ymm2, YMMWORD PTR [rsi+64]
        vmovdqu YMMWORD PTR [rdi+64], ymm2
        vmovdqu ymm3, YMMWORD PTR [rsi+96]
        vmovdqu YMMWORD PTR [rdi+96], ymm3
        vzeroupper
        ret

Previous message (by thread): [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Next message (by thread): [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list