[PATCH v4 00/13] Optionally have rte_memcpy delegate to compiler memcpy
Maxime Coquelin
maxime.coquelin at redhat.com
Wed Jun 26 10:37:31 CEST 2024
On 6/25/24 21:27, Mattias Rönnblom wrote:
> On Tue, Jun 25, 2024 at 05:29:35PM +0200, Maxime Coquelin wrote:
>> Hi Mattias,
>>
>> On 6/20/24 19:57, Mattias Rönnblom wrote:
>>> This patch set make DPDK library, driver, and application code use the
>>> compiler/libc memcpy() by default when functions in <rte_memcpy.h> are
>>> invoked.
>>>
>>> The various custom DPDK rte_memcpy() implementations may be retained
>>> by means of a build-time option.
>>>
>>> This patch set only make a difference on x86, PPC and ARM. Loongarch
>>> and RISCV already used compiler/libc memcpy().
>>
>> It indeed makes a difference on x86!
>>
>> Just tested latest main with and without your series on
>> Intel(R) Xeon(R) Gold 6438N.
>>
>> The test is a simple IO loop between a Vhost PMD and a Virtio-user PMD:
>> # dpdk-testpmd -l 4-6 --file-prefix=virtio1 --no-pci --vdev 'net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net,server=1,mrg_rxbuf=1,in_order=1'
>> --single-file-segments -- -i
>> testpmd> start
>>
>> # dpdk-testpmd -l 8-10 --file-prefix=vhost1 --no-pci --vdev
>> 'net_vhost0,iface=vhost-net,client=1' --single-file-segments -- -i
>> testpmd> start tx_first 32
>>
>> Latest main: 14.5Mpps
>> Latest main + this series: 10Mpps
>>
>
> I ran the above benchmark on my Raptor Lake desktop (locked to 3,2
> GHz). GCC 12.3.0.
>
> Core use_cc_memcpy Mpps
> E false 9.5
> E true 9.7
> P false 16.4
> P true 13.5
>
> On the P-cores, there's a significant performance regression, although
> not as bad as the one you see on your Sapphire Rapids Xeon. On the
> E-cores, there's actually a slight performance gain.
>
> The virtio PMD does not directly invoke rte_memcpy() or anything else
> from <rte_memcpy.h>, but rather use memcpy(), so I'm not sure I
> understand what's going on here. Does the virtio driver delegate some
> performance-critical task to some module that in turns uses
> rte_memcpy()?
This is because Vhost is the bottleneck here, not Virtio driver.
Indeed, the virtqueues memory belongs to the Virtio driver and the
descriptors buffers are Virtio's mbufs, so not much memcpy's are done
there.
Vhost however, is a heavy memcpy user, as all the descriptors buffers
are copied to/from its mbufs.
>> So for me, it should be disabled by default.
>>
>> Regards,
>> Maxime
>>
>>> This patch set includes a number of fixes in drivers and libraries
>>> which errornously relied on <rte_memcpy.h> including header files
>>> (i.e., <rte_vect.h>) required by its implementation.
>>>
>>> Mattias Rönnblom (13):
>>> net/i40e: add missing vector API header include
>>> net/iavf: add missing vector API header include
>>> net/ice: add missing vector API header include
>>> net/ixgbe: add missing vector API header include
>>> net/ngbe: add missing vector API header include
>>> net/txgbe: add missing vector API header include
>>> net/virtio: add missing vector API header include
>>> net/fm10k: add missing vector API header include
>>> event/dlb2: include headers for vector and memory copy APIs
>>> net/octeon_ep: add missing vector API header include
>>> distributor: add missing vector API header include
>>> fib: add missing vector API header include
>>> eal: provide option to use compiler memcpy instead of RTE
>>>
>>> config/meson.build | 1 +
>>> doc/guides/rel_notes/release_24_07.rst | 21 +++++++
>>> drivers/event/dlb2/dlb2.c | 2 +
>>> drivers/net/fm10k/fm10k_rxtx_vec.c | 3 +-
>>> drivers/net/i40e/i40e_rxtx_vec_sse.c | 3 +-
>>> drivers/net/iavf/iavf_rxtx_vec_sse.c | 3 +-
>>> drivers/net/ice/ice_rxtx_vec_sse.c | 2 +-
>>> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 +-
>>> drivers/net/ngbe/ngbe_rxtx_vec_sse.c | 3 +-
>>> drivers/net/octeon_ep/otx_ep_ethdev.c | 2 +
>>> drivers/net/txgbe/txgbe_rxtx_vec_sse.c | 3 +-
>>> drivers/net/virtio/virtio_rxtx_simple_sse.c | 3 +-
>>> lib/distributor/rte_distributor.c | 1 +
>>> lib/eal/arm/include/rte_memcpy.h | 10 ++++
>>> lib/eal/include/generic/rte_memcpy.h | 61 ++++++++++++++++++---
>>> lib/eal/loongarch/include/rte_memcpy.h | 53 ++----------------
>>> lib/eal/ppc/include/rte_memcpy.h | 10 ++++
>>> lib/eal/riscv/include/rte_memcpy.h | 53 ++----------------
>>> lib/eal/x86/include/meson.build | 1 +
>>> lib/eal/x86/include/rte_memcpy.h | 11 +++-
>>> lib/fib/trie.c | 1 +
>>> meson_options.txt | 2 +
>>> 22 files changed, 131 insertions(+), 121 deletions(-)
>>>
>>
>
More information about the dev
mailing list