[PATCH v4 00/13] Optionally have rte_memcpy delegate to compiler memcpy
Mattias Rönnblom
hofors at lysator.liu.se
Tue Jun 25 21:27:06 CEST 2024
On Tue, Jun 25, 2024 at 05:29:35PM +0200, Maxime Coquelin wrote:
> Hi Mattias,
>
> On 6/20/24 19:57, Mattias Rönnblom wrote:
> > This patch set make DPDK library, driver, and application code use the
> > compiler/libc memcpy() by default when functions in <rte_memcpy.h> are
> > invoked.
> >
> > The various custom DPDK rte_memcpy() implementations may be retained
> > by means of a build-time option.
> >
> > This patch set only make a difference on x86, PPC and ARM. Loongarch
> > and RISCV already used compiler/libc memcpy().
>
> It indeed makes a difference on x86!
>
> Just tested latest main with and without your series on
> Intel(R) Xeon(R) Gold 6438N.
>
> The test is a simple IO loop between a Vhost PMD and a Virtio-user PMD:
> # dpdk-testpmd -l 4-6 --file-prefix=virtio1 --no-pci --vdev 'net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net,server=1,mrg_rxbuf=1,in_order=1'
> --single-file-segments -- -i
> testpmd> start
>
> # dpdk-testpmd -l 8-10 --file-prefix=vhost1 --no-pci --vdev
> 'net_vhost0,iface=vhost-net,client=1' --single-file-segments -- -i
> testpmd> start tx_first 32
>
> Latest main: 14.5Mpps
> Latest main + this series: 10Mpps
>
I ran the above benchmark on my Raptor Lake desktop (locked to 3,2
GHz). GCC 12.3.0.
Core use_cc_memcpy Mpps
E false 9.5
E true 9.7
P false 16.4
P true 13.5
On the P-cores, there's a significant performance regression, although
not as bad as the one you see on your Sapphire Rapids Xeon. On the
E-cores, there's actually a slight performance gain.
The virtio PMD does not directly invoke rte_memcpy() or anything else
from <rte_memcpy.h>, but rather use memcpy(), so I'm not sure I
understand what's going on here. Does the virtio driver delegate some
performance-critical task to some module that in turns uses
rte_memcpy()?
> So for me, it should be disabled by default.
>
> Regards,
> Maxime
>
> > This patch set includes a number of fixes in drivers and libraries
> > which errornously relied on <rte_memcpy.h> including header files
> > (i.e., <rte_vect.h>) required by its implementation.
> >
> > Mattias Rönnblom (13):
> > net/i40e: add missing vector API header include
> > net/iavf: add missing vector API header include
> > net/ice: add missing vector API header include
> > net/ixgbe: add missing vector API header include
> > net/ngbe: add missing vector API header include
> > net/txgbe: add missing vector API header include
> > net/virtio: add missing vector API header include
> > net/fm10k: add missing vector API header include
> > event/dlb2: include headers for vector and memory copy APIs
> > net/octeon_ep: add missing vector API header include
> > distributor: add missing vector API header include
> > fib: add missing vector API header include
> > eal: provide option to use compiler memcpy instead of RTE
> >
> > config/meson.build | 1 +
> > doc/guides/rel_notes/release_24_07.rst | 21 +++++++
> > drivers/event/dlb2/dlb2.c | 2 +
> > drivers/net/fm10k/fm10k_rxtx_vec.c | 3 +-
> > drivers/net/i40e/i40e_rxtx_vec_sse.c | 3 +-
> > drivers/net/iavf/iavf_rxtx_vec_sse.c | 3 +-
> > drivers/net/ice/ice_rxtx_vec_sse.c | 2 +-
> > drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 +-
> > drivers/net/ngbe/ngbe_rxtx_vec_sse.c | 3 +-
> > drivers/net/octeon_ep/otx_ep_ethdev.c | 2 +
> > drivers/net/txgbe/txgbe_rxtx_vec_sse.c | 3 +-
> > drivers/net/virtio/virtio_rxtx_simple_sse.c | 3 +-
> > lib/distributor/rte_distributor.c | 1 +
> > lib/eal/arm/include/rte_memcpy.h | 10 ++++
> > lib/eal/include/generic/rte_memcpy.h | 61 ++++++++++++++++++---
> > lib/eal/loongarch/include/rte_memcpy.h | 53 ++----------------
> > lib/eal/ppc/include/rte_memcpy.h | 10 ++++
> > lib/eal/riscv/include/rte_memcpy.h | 53 ++----------------
> > lib/eal/x86/include/meson.build | 1 +
> > lib/eal/x86/include/rte_memcpy.h | 11 +++-
> > lib/fib/trie.c | 1 +
> > meson_options.txt | 2 +
> > 22 files changed, 131 insertions(+), 121 deletions(-)
> >
>
More information about the dev
mailing list