[PATCH v2 00/71] replace use of fixed size rte_mempcy
Mattias Rönnblom
hofors at lysator.liu.se
Sat Mar 2 13:01:51 CET 2024
On 2024-03-02 12:14, Mattias Rönnblom wrote:
> On 2024-03-01 18:14, Stephen Hemminger wrote:
>> The DPDK has a lot of "cargo cult" usage of rte_memcpy.
>> This patch set replaces cases where rte_memcpy is used with a fixed
>> size constant size.
>>
>> Typical example is:
>> rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
>> which can be replaced with:
>> memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
>>
>> This has two benefits. Gcc (and clang) are smart enough that for
>> all small fixed size values, they just generate the necessary
>> instructions
>> to do it inline. It also means that fortify, Coverity, and ASAN
>> analyzers can check these memcpy's.
>>
>
> Instead of smearing out the knowledge of when to use rte_memcpy(), and
> when to use memcpy() across the code base, wouldn't it be better to
> *always* call rte_memcpy() in the fast path, and leave the policy
> decision to the rte_memcpy() implementation?
>
> In rte_memcpy(), add:
> if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD)
> memcpy(/../);
> ..or something to that effect.
>
> Could you have a #ifdef for dumb static analysis tools? To make it look
> like you are always using memcpy()?
>
>> So faster, better, safer.
>>
>
> What is "faster" based on?
>
I ran some DSW benchmarks, and if you add
diff --git a/lib/eal/x86/include/rte_memcpy.h
b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..64cd82d78d 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src,
size_t n)
static __rte_always_inline void *
rte_memcpy(void *dst, const void *src, size_t n)
{
+ if (__builtin_constant_p(n) && n <= 32) {
+ memcpy(dst, src, n);
+ return dst;
+ }
+
if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
return rte_memcpy_aligned(dst, src, n);
else
...the overhead increases from roughly 48 core clock cycles/event to 59
cc/event. The same for "n < 128". (I'm not sure what counts as "small"
here.)
So something rte_memcpy() does for small and constant memory copies does
make things go *significantly* faster, at least in certain cases.
(Linux, GCC 11.4, Intel Gracemont.)
> My experience with replacing rte_memcpy() with memcpy() (or vice versa)
> is mixed.
>
> I've also tried just dropping the DPDK-custom memcpy() implementation
> altogether, and that caused a performance drop (in a particular app, on
> a particular compiler and CPU).
>
>> The first patch is a simple coccinelle script to do the replacement
>> and the rest are the results broken out by module.
>>
>> The coccinelle script can be used again to make sure more bad
>> usage doesn't creep in with new drivers.
>>
>> v2 - fix CI failure on some OS by adding string.h
>> remove rte_memcpy.h if no longer used
>>
>> Stephen Hemminger (71):
>> cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
>> eal: replace use of fixed size rte_memcpy
>> ethdev: replace use of fixed size rte_memcpy
>> eventdev: replace use of fixed size rte_memcpy
>> cryptodev: replace use of fixed size rte_memcpy
>> ip_frag: replace use of fixed size rte_memcpy
>> net: replace use of fixed size rte_memcpy
>> lpm: replace use of fixed size rte_memcpy
>> node: replace use of fixed size rte_memcpy
>> pdcp: replace use of fixed size rte_memcpy
>> pipeline: replace use of fixed size rte_memcpy
>> rib: replace use of fixed size rte_memcpy
>> security: replace use of fixed size rte_memcpy
>> net/mlx5: replace use of fixed size rte_memcpy
>> net/nfp: replace use of fixed size rte_memcpy
>> net/ngbe: replace use of fixed size rte_memcpy
>> net/null: replace use of fixed size rte_memcpy
>> net/pcap: replace use of fixed size rte_memcpy
>> net/sfc: replace use of fixed size rte_memcpy
>> net/tap: replace use of fixed size rte_memcpy
>> net/txgbe: replace use of fixed size rte_memcpy
>> raw/ifpga: replace use of fixed size rte_memcpy
>> raw/skeleton: replace use of fixed size rte_memcpy
>> net/hns3: replace use of fixed size rte_memcpy
>> net/i40e: replace use of fixed size rte_memcpy
>> net/iavf: replace use of fixed size rte_memcpy
>> net/ice: replace use of fixed size rte_memcpy
>> net/idpf: replace use of fixed size rte_memcpy
>> net/ipn3ke: replace use of fixed size rte_memcpy
>> net/ixgbe: replace use of fixed size rte_memcpy
>> net/memif: replace use of fixed size rte_memcpy
>> net/qede: replace use of fixed size rte_memcpy
>> baseband/acc: replace use of fixed size rte_memcpy
>> baseband/la12xx: replace use of fixed size rte_memcpy
>> common/idpf: replace use of fixed size rte_memcpy
>> common/qat: replace use of fixed size rte_memcpy
>> compress/qat: replace use of fixed size rte_memcpy
>> crypto/ccp: replace use of fixed size rte_memcpy
>> crypto/cnxk: replace use of fixed size rte_memcpy
>> crypto/dpaa_sec: replace use of fixed size rte_memcpy
>> crypto/ipsec_mb: replace use of fixed size rte_memcpy
>> crypto/qat: replace use of fixed size rte_memcpy
>> crypto/scheduler: replace use of fixed size rte_memcpy
>> event/cnxk: replace use of fixed size rte_memcpy
>> event/dlb2: replace use of fixed size rte_memcpy
>> event/dpaa2: replace use of fixed size rte_memcpy
>> event/octeontx: replace use of fixed size rte_memcpy
>> mempool/dpaa: replace use of fixed size rte_memcpy
>> mempool/dpaa2: replace use of fixed size rte_memcpy
>> ml/cnxk: replace use of fixed size rte_memcpy
>> net/af_xdp: replace use of fixed size rte_memcpy
>> net/avp: replace use of fixed size rte_memcpy
>> net/axgbe: replace use of fixed size rte_memcpy
>> net/bnx2x: replace use of fixed size rte_memcpy
>> net/bnxt: replace use of fixed size rte_memcpy
>> net/bonding: replace use of fixed size rte_memcpy
>> net/cnxk: replace use of fixed size rte_memcpy
>> net/cpfl: replace use of fixed size rte_memcpy
>> net/cxgbe: replace use of fixed size rte_memcpy
>> net/dpaa2: replace use of fixed size rte_memcpy
>> net/e1000: replace use of fixed size rte_memcpy
>> net/enic: replace use of fixed size rte_memcpy
>> net/failsafe: replace use of fixed size rte_memcpy
>> net/gve/base: replace use of fixed size rte_memcpy
>> net/hinic: replace use of fixed size rte_memcpy
>> net/mvpp2: replace use of fixed size rte_memcpy
>> app/test-pmd: replace use of fixed size rte_memcpy
>> app/graph: replace use of fixed size rte_memcpy
>> app/test-eventdev: replace use of fixed size rte_memcpy
>> app/test: replace use of fixed size rte_memcpy
>> examples: replace use of fixed size rte_memcpy
>>
>> app/graph/neigh.c | 8 +-
>> app/test-eventdev/test_pipeline_common.c | 19 ++-
>> app/test-pmd/cmdline.c | 48 ++++----
>> app/test-pmd/cmdline_flow.c | 24 ++--
>> app/test-pmd/config.c | 8 +-
>> app/test-pmd/csumonly.c | 1 -
>> app/test-pmd/flowgen.c | 1 -
>> app/test-pmd/iofwd.c | 1 -
>> app/test-pmd/macfwd.c | 1 -
>> app/test-pmd/macswap.c | 1 -
>> app/test-pmd/noisy_vnf.c | 1 -
>> app/test-pmd/rxonly.c | 1 -
>> app/test-pmd/testpmd.c | 1 -
>> app/test/commands.c | 1 -
>> app/test/packet_burst_generator.c | 4 +-
>> app/test/test_crc.c | 5 +-
>> app/test/test_cryptodev.c | 18 ++-
>> app/test/test_cryptodev_asym.c | 1 -
>> app/test/test_cryptodev_security_pdcp.c | 1 -
>> app/test/test_efd.c | 1 -
>> app/test/test_efd_perf.c | 1 -
>> app/test/test_event_crypto_adapter.c | 12 +-
>> app/test/test_event_dma_adapter.c | 4 +-
>> app/test/test_eventdev.c | 1 -
>> app/test/test_ipsec.c | 6 +-
>> app/test/test_link_bonding_mode4.c | 8 +-
>> app/test/test_mbuf.c | 1 -
>> app/test/test_member.c | 1 -
>> app/test/test_member_perf.c | 1 -
>> app/test/test_rawdev.c | 1 -
>> app/test/test_security_inline_proto.c | 36 +++---
>> app/test/test_service_cores.c | 1 -
>> app/test/virtual_pmd.c | 3 +-
>> devtools/cocci/rte_memcpy.cocci | 11 ++
>> drivers/baseband/acc/rte_acc100_pmd.c | 17 ++-
>> drivers/baseband/acc/rte_vrb_pmd.c | 21 ++--
>> drivers/baseband/la12xx/bbdev_la12xx.c | 4 +-
>> drivers/common/idpf/idpf_common_device.c | 4 +-
>> drivers/common/idpf/idpf_common_virtchnl.c | 8 +-
>> drivers/common/qat/qat_qp.c | 10 +-
>> drivers/compress/qat/qat_comp.c | 8 +-
>> drivers/crypto/ccp/ccp_crypto.c | 14 +--
>> drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 2 +-
>> drivers/crypto/cnxk/cnxk_se.h | 2 +-
>> drivers/crypto/dpaa_sec/dpaa_sec.c | 2 +-
>> drivers/crypto/ipsec_mb/pmd_snow3g.c | 4 +-
>> drivers/crypto/qat/qat_sym_session.c | 52 ++++-----
>> .../scheduler/rte_cryptodev_scheduler.c | 6 +-
>> drivers/crypto/scheduler/scheduler_failover.c | 12 +-
>> drivers/event/cnxk/cnxk_tim_evdev.c | 4 +-
>> drivers/event/dlb2/dlb2.c | 6 +-
>> drivers/event/dpaa2/dpaa2_eventdev.c | 6 +-
>> drivers/event/octeontx/timvf_evdev.c | 4 +-
>> drivers/mempool/dpaa/dpaa_mempool.c | 4 +-
>> drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 4 +-
>> drivers/ml/cnxk/cn10k_ml_model.c | 8 +-
>> drivers/ml/cnxk/cn10k_ml_ops.c | 11 +-
>> drivers/ml/cnxk/cnxk_ml_ops.c | 2 +-
>> drivers/ml/cnxk/mvtvm_ml_model.c | 8 +-
>> drivers/ml/cnxk/mvtvm_ml_ops.c | 8 +-
>> drivers/net/af_xdp/rte_eth_af_xdp.c | 2 +-
>> drivers/net/avp/avp_ethdev.c | 4 +-
>> drivers/net/axgbe/axgbe_ethdev.c | 4 +-
>> drivers/net/bnx2x/bnx2x.c | 32 +++--
>> drivers/net/bnx2x/bnx2x_stats.c | 10 +-
>> drivers/net/bnx2x/bnx2x_vfpf.c | 19 +--
>> drivers/net/bnxt/bnxt_flow.c | 34 +++---
>> drivers/net/bonding/rte_eth_bond_8023ad.c | 4 +-
>> drivers/net/bonding/rte_eth_bond_flow.c | 2 +-
>> drivers/net/cnxk/cnxk_ethdev_ops.c | 2 +-
>> drivers/net/cnxk/cnxk_tm.c | 5 +-
>> drivers/net/cpfl/cpfl_ethdev.c | 3 +-
>> drivers/net/cpfl/cpfl_vchnl.c | 4 +-
>> drivers/net/cxgbe/clip_tbl.c | 2 +-
>> drivers/net/cxgbe/cxgbe_filter.c | 8 +-
>> drivers/net/cxgbe/l2t.c | 4 +-
>> drivers/net/cxgbe/smt.c | 20 ++--
>> drivers/net/dpaa2/base/dpaa2_hw_dpni.c | 1 -
>> drivers/net/dpaa2/dpaa2_ethdev.c | 1 -
>> drivers/net/dpaa2/dpaa2_recycle.c | 1 -
>> drivers/net/dpaa2/dpaa2_rxtx.c | 1 -
>> drivers/net/dpaa2/dpaa2_sparser.c | 1 -
>> drivers/net/dpaa2/dpaa2_tm.c | 2 +-
>> drivers/net/e1000/em_rxtx.c | 1 -
>> drivers/net/e1000/igb_flow.c | 22 ++--
>> drivers/net/e1000/igb_pf.c | 7 +-
>> drivers/net/e1000/igb_rxtx.c | 1 -
>> drivers/net/enic/enic_main.c | 8 +-
>> drivers/net/failsafe/failsafe_ops.c | 6 +-
>> drivers/net/gve/base/gve_adminq.c | 2 +-
>> drivers/net/hinic/hinic_pmd_flow.c | 40 +++----
>> drivers/net/hns3/hns3_fdir.c | 2 +-
>> drivers/net/hns3/hns3_flow.c | 4 +-
>> drivers/net/i40e/i40e_ethdev.c | 109 ++++++++----------
>> drivers/net/i40e/i40e_fdir.c | 28 +++--
>> drivers/net/i40e/i40e_flow.c | 56 +++++----
>> drivers/net/i40e/i40e_pf.c | 3 +-
>> drivers/net/i40e/i40e_tm.c | 11 +-
>> drivers/net/i40e/rte_pmd_i40e.c | 34 +++---
>> drivers/net/iavf/iavf_fdir.c | 93 +++++++--------
>> drivers/net/iavf/iavf_fsub.c | 50 ++++----
>> drivers/net/iavf/iavf_generic_flow.c | 2 +-
>> drivers/net/iavf/iavf_tm.c | 11 +-
>> drivers/net/iavf/iavf_vchnl.c | 9 +-
>> drivers/net/ice/ice_dcf.c | 5 +-
>> drivers/net/ice/ice_dcf_parent.c | 2 +-
>> drivers/net/ice/ice_dcf_sched.c | 11 +-
>> drivers/net/ice/ice_diagnose.c | 4 +-
>> drivers/net/ice/ice_ethdev.c | 14 +--
>> drivers/net/ice/ice_fdir_filter.c | 37 +++---
>> drivers/net/ice/ice_generic_flow.c | 2 +-
>> drivers/net/ice/ice_hash.c | 2 +-
>> drivers/net/ice/ice_tm.c | 11 +-
>> drivers/net/idpf/idpf_ethdev.c | 7 +-
>> drivers/net/idpf/idpf_rxtx.c | 10 +-
>> drivers/net/ipn3ke/ipn3ke_flow.c | 32 +++--
>> drivers/net/ipn3ke/ipn3ke_representor.c | 16 +--
>> drivers/net/ipn3ke/ipn3ke_tm.c | 6 +-
>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +-
>> drivers/net/ixgbe/ixgbe_fdir.c | 7 +-
>> drivers/net/ixgbe/ixgbe_flow.c | 65 +++++------
>> drivers/net/ixgbe/ixgbe_ipsec.c | 8 +-
>> drivers/net/ixgbe/ixgbe_pf.c | 5 +-
>> drivers/net/ixgbe/ixgbe_tm.c | 11 +-
>> drivers/net/ixgbe/rte_pmd_ixgbe.c | 4 +-
>> drivers/net/memif/memif_socket.c | 4 +-
>> drivers/net/mlx5/mlx5_devx.c | 4 +-
>> drivers/net/mlx5/mlx5_flow.c | 38 +++---
>> drivers/net/mlx5/mlx5_flow_aso.c | 6 +-
>> drivers/net/mlx5/mlx5_flow_hw.c | 16 +--
>> drivers/net/mlx5/mlx5_rx.c | 6 +-
>> drivers/net/mlx5/mlx5_rxtx_vec.c | 8 +-
>> drivers/net/mvpp2/mrvl_tm.c | 2 +-
>> drivers/net/nfp/flower/nfp_conntrack.c | 2 +-
>> drivers/net/nfp/flower/nfp_flower_flow.c | 16 +--
>> .../net/nfp/flower/nfp_flower_representor.c | 2 +-
>> drivers/net/nfp/nfp_mtr.c | 10 +-
>> drivers/net/ngbe/ngbe_pf.c | 4 +-
>> drivers/net/null/rte_eth_null.c | 6 +-
>> drivers/net/pcap/pcap_ethdev.c | 2 +-
>> drivers/net/pcap/pcap_osdep_freebsd.c | 2 +-
>> drivers/net/pcap/pcap_osdep_linux.c | 2 +-
>> drivers/net/qede/qede_main.c | 2 +-
>> drivers/net/sfc/sfc.c | 2 +-
>> drivers/net/sfc/sfc_ef10_tx.c | 2 +-
>> drivers/net/sfc/sfc_ethdev.c | 11 +-
>> drivers/net/sfc/sfc_flow.c | 20 ++--
>> drivers/net/sfc/sfc_flow_rss.c | 2 +-
>> drivers/net/sfc/sfc_mae.c | 2 +-
>> drivers/net/sfc/sfc_rx.c | 2 +-
>> drivers/net/sfc/sfc_tso.c | 2 +-
>> drivers/net/sfc/sfc_tso.h | 9 +-
>> drivers/net/tap/rte_eth_tap.c | 14 +--
>> drivers/net/txgbe/txgbe_ethdev.c | 9 +-
>> drivers/net/txgbe/txgbe_fdir.c | 6 +-
>> drivers/net/txgbe/txgbe_flow.c | 65 +++++------
>> drivers/net/txgbe/txgbe_ipsec.c | 8 +-
>> drivers/net/txgbe/txgbe_pf.c | 4 +-
>> drivers/net/txgbe/txgbe_tm.c | 11 +-
>> drivers/raw/ifpga/afu_pmd_he_hssi.c | 3 +-
>> drivers/raw/ifpga/afu_pmd_he_lpbk.c | 3 +-
>> drivers/raw/ifpga/afu_pmd_he_mem.c | 3 +-
>> drivers/raw/ifpga/afu_pmd_n3000.c | 8 +-
>> drivers/raw/ifpga/ifpga_rawdev.c | 11 +-
>> drivers/raw/skeleton/skeleton_rawdev.c | 7 +-
>> examples/bbdev_app/main.c | 2 +-
>> examples/l2fwd-cat/cat.c | 4 +-
>> examples/ptpclient/ptpclient.c | 11 +-
>> examples/vhost/main.c | 5 +-
>> examples/vmdq/main.c | 6 +-
>> examples/vmdq_dcb/main.c | 15 +--
>> lib/cryptodev/rte_cryptodev.c | 2 +-
>> lib/eal/common/eal_common_options.c | 7 +-
>> lib/ethdev/rte_ethdev.c | 3 +-
>> lib/ethdev/rte_flow.c | 5 +-
>> lib/eventdev/rte_event_crypto_adapter.c | 2 +-
>> lib/eventdev/rte_event_dma_adapter.c | 4 +-
>> lib/eventdev/rte_event_timer_adapter.c | 2 +-
>> lib/fib/trie.c | 2 +-
>> lib/ip_frag/rte_ipv6_fragmentation.c | 4 +-
>> lib/ip_frag/rte_ipv6_reassembly.c | 6 +-
>> lib/lpm/rte_lpm6.c | 3 +-
>> lib/net/rte_ether.c | 2 +-
>> lib/node/ip6_lookup.c | 8 +-
>> lib/pdcp/pdcp_process.c | 36 +++---
>> lib/pipeline/rte_table_action.c | 8 +-
>> lib/rib/rte_rib6.h | 5 +-
>> lib/security/rte_security.c | 4 +-
>> 188 files changed, 881 insertions(+), 998 deletions(-)
>> create mode 100644 devtools/cocci/rte_memcpy.cocci
>>
More information about the dev
mailing list