[PATCH] app/testpmd: improve sse based macswap
Ferruh Yigit
ferruh.yigit at amd.com
Tue Jul 23 18:45:57 CEST 2024
On 7/16/2024 7:37 AM, Vipin Varghese wrote:
> Goal of the patch is to improve SSE macswap on x86_64 by reducing
> the stalls in backend engine. Original implementation of the SSE
> macswap makes loop call to multiple load, shuffle & store. Using
> SIMD ISA interleaving we can reduce the stalls for
> - load SSE token exhaustion
> - Shuffle and Load dependency
>
> Also other changes which improves packet per second are
> - Filling access to MBUF for offload flags which is separate cacheline,
> - using register keyword
>
> Build test using meson script:
> ``````````````````````````````
>
> build-gcc-static
> buildtools
> build-gcc-shared
> build-mini
> build-clang-static
> build-clang-shared
> build-x86-generic
>
> Test Results:
> `````````````
>
> Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost
>
> ------------------------------------------------
> TEST IO 64B: baseline <NIC : MPPs>
> - mellanox CX-7 2*200Gbps : 42.0
> - intel E810 1*100Gbps : 82.0
> - intel E810 2*200Gbps (2CQ-DA2): 82.45
> ------------------------------------------------
> TEST MACSWAP 64B: <NIC : Before : After>
> - mellanox CX-7 2*200Gbps : 31.533 : 31.90
> - intel E810 1*100Gbps : 50.380 : 47.0
> - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827
> ------------------------------------------------
> TEST MACSWAP 128B: <NIC : Before: After>
> - mellanox CX-7 2*200Gbps: 30.946 : 31.770
> - intel E810 1*100Gbps: 49.386 : 46.366
> - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503
> ------------------------------------------------
> TEST MACSWAP 256B: <NIC: Before: After>
> - mellanox CX-7 2*200Gbps: 32.480 : 33.150
> - intel E810 1 * 100Gbps: 45.29 : 44.571
> - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117
> ------------------------------------------------
>
> Platform-2: AMD EPYC 9554 @3.1GHz, no boost
>
> ------------------------------------------------
> TEST IO 64B: baseline <NIC : MPPs>
> - intel E810 2*200Gbps (2CQ-DA2): 82.49
> ------------------------------------------------
> <NIC intel E810 2*200Gbps (2CQ-DA2): Before : After>
> TEST MACSWAP: 1Q 1C1T
> 64B: : 45.0 : 45.54
> 128B: : 44.48 : 44.43
> 256B: : 42.0 : 41.99
> +++++++++++++++++++++++++
> TEST MACSWAP: 2Q 2C2T
> 64B: : 59.5 : 60.55
> 128B: : 56.78 : 58.1
> 256B: : 41.85 : 41.99
> ------------------------------------------------
>
> Signed-off-by: Vipin Varghese <vipin.varghese at amd.com>
>
Hi Bruce, John,
Can you please help testing macswap performance with this patch on Intel
platforms, to be sure it is not causing regression?
Other option is to get this patch for -rc3 and tested there, with the
condition to remove it in any regression, if this help testing the patch?
Thanks,
ferruh
More information about the dev
mailing list