[PATCH v12 0/7] Improve EAL bit operations API
David Marchand
david.marchand at redhat.com
Wed Oct 9 22:18:54 CEST 2024
On Fri, Sep 20, 2024 at 12:57 PM Mattias Rönnblom
<mattias.ronnblom at ericsson.com> wrote:
>
> This patch set represent an attempt to improve and extend the RTE
> bitops API, in particular for functions that operate on individual
> bits.
>
> All new functionality is exposed to the user as generic selection
> macros, delegating the actual work to private (__-marked) static
> inline functions. Public functions (e.g., rte_bit_set32()) would just
> be bloating the API. Such generic selection macros will here be
> referred to as "functions", although technically they are not.
>
> The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
> with two new families:
>
> rte_bit_[test|set|clear|assign|flip]() which provides no memory
> ordering or atomicity guarantees, but does provide the best
> performance. The performance degradation resulting from the use of
> volatile (e.g., forcing loads and stores to actually occur and in the
> number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
> may be significant. rte_bit_[test|set|clear|assign|flip]() may be
> used with volatile word pointers, in which case they guarantee
> that the program-level accesses actually occur.
>
> rte_bit_atomic_*() which provides atomic bit-level operations,
> including the possibility to specifying memory ordering constraints
> (or the lack thereof).
>
> The atomic functions take non-_Atomic pointers, to be flexible, just
> like the GCC builtins and default <rte_stdatomic.h>. The issue with
> _Atomic APIs is that it may well be the case that the user wants to
> perform both non-atomic and atomic operations on the same word.
>
> Having _Atomic-marked addresses would complicate supporting atomic
> bit-level operations in the bitset API (proposed in a different RFC
> patchset), and potentially other APIs depending on RTE bitops for
> atomic bit-level ops). Either one needs two bitset variants, one
> _Atomic bitset and one non-atomic one, or the bitset code needs to
> cast the non-_Atomic pointer to an _Atomic one. Having a separate
> _Atomic bitset would be bloat and also prevent the user from both, in
> some situations, doing atomic operations against a bit set, while in
> other situations (e.g., at times when MT safety is not a concern)
> operating on the same objects in a non-atomic manner.
>
> Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
> not uint32_t or uint64_t. The author found the use of such large types
> confusing, and also failed to see any performance benefits.
>
> A set of functions rte_bit_*_assign() are added, to assign a
> particular boolean value to a particular bit.
>
> All new functions have properly documented semantics.
>
> All new functions operate on both 32 and 64-bit words, with type
> checking.
>
> _Generic allow the user code to be a little more impact. Have a
> type-generic atomic test/set/clear/assign bit API also seems
> consistent with the "core" (word-size) atomics API, which is generic
> (both GCC builtins and <rte_stdatomic.h> are).
>
> The _Generic versions avoids having explicit unsigned long versions of
> all functions. If you have an unsigned long, it's safe to use the
> generic version (e.g., rte_set_bit()) and _Generic will pick the right
> function, provided long is either 32 or 64 bit on your platform (which
> it is on all DPDK-supported ABIs).
>
> The generic rte_bit_set() is a macro, and not a function, but
> nevertheless has been given a lower-case name. That's how C11 does it
> (for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
> can't be taken, but it does not evaluate its parameters more than
> once.
>
> C++ doesn't support generic selection. In C++ translation units the
> _Generic macros are replaced with overloaded functions, implemented by
> means of a huge, complicated C macro mess.
>
> Mattias Rönnblom (7):
> buildtools/chkincs: relax C linkage requirement
> dpdk: use C linkage only where appropriate
> eal: extend bit manipulation functionality
> eal: add unit tests for bit operations
> eal: add atomic bit operations
> eal: add unit tests for atomic bit access functions
> eal: extend bitops to handle volatile pointers
>
> app/test/packet_burst_generator.h | 8 +-
> app/test/test_bitops.c | 416 +++++++++-
> app/test/virtual_pmd.h | 4 +-
> buildtools/chkincs/chkextern.py | 88 ++
> buildtools/chkincs/meson.build | 21 +-
> doc/guides/rel_notes/release_24_11.rst | 17 +
> drivers/bus/auxiliary/bus_auxiliary_driver.h | 8 +-
> drivers/bus/cdx/bus_cdx_driver.h | 8 +-
> drivers/bus/dpaa/include/fsl_qman.h | 8 +-
> drivers/bus/fslmc/bus_fslmc_driver.h | 8 +-
> drivers/bus/pci/bus_pci_driver.h | 8 +-
> drivers/bus/pci/rte_bus_pci.h | 8 +-
> drivers/bus/platform/bus_platform_driver.h | 8 +-
> drivers/bus/vdev/bus_vdev_driver.h | 8 +-
> drivers/bus/vmbus/bus_vmbus_driver.h | 8 +-
> drivers/bus/vmbus/rte_bus_vmbus.h | 8 +-
> drivers/dma/cnxk/cnxk_dma_event_dp.h | 8 +-
> drivers/dma/ioat/ioat_hw_defs.h | 4 +-
> drivers/event/dlb2/rte_pmd_dlb2.h | 8 +-
> drivers/mempool/dpaa2/rte_dpaa2_mempool.h | 6 +-
> drivers/net/avp/rte_avp_fifo.h | 8 +-
> drivers/net/bonding/rte_eth_bond.h | 4 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/mlx5/mlx5_trace.h | 8 +-
> drivers/net/ring/rte_eth_ring.h | 4 +-
> drivers/net/vhost/rte_eth_vhost.h | 8 +-
> drivers/raw/ifpga/afu_pmd_core.h | 8 +-
> drivers/raw/ifpga/afu_pmd_he_hssi.h | 6 +-
> drivers/raw/ifpga/afu_pmd_he_lpbk.h | 6 +-
> drivers/raw/ifpga/afu_pmd_he_mem.h | 6 +-
> drivers/raw/ifpga/afu_pmd_n3000.h | 6 +-
> drivers/raw/ifpga/rte_pmd_afu.h | 4 +-
> drivers/raw/ifpga/rte_pmd_ifpga.h | 4 +-
> examples/ethtool/lib/rte_ethtool.h | 8 +-
> examples/qos_sched/main.h | 4 +-
> examples/vm_power_manager/channel_manager.h | 8 +-
> lib/acl/rte_acl_osdep.h | 8 -
> lib/bbdev/rte_bbdev.h | 8 +-
> lib/bbdev/rte_bbdev_op.h | 8 +-
> lib/bbdev/rte_bbdev_pmd.h | 8 +-
> lib/bpf/bpf_def.h | 9 -
> lib/compressdev/rte_comp.h | 4 +-
> lib/compressdev/rte_compressdev.h | 6 +-
> lib/compressdev/rte_compressdev_internal.h | 8 +-
> lib/compressdev/rte_compressdev_pmd.h | 8 +-
> lib/cryptodev/cryptodev_pmd.h | 8 +-
> lib/cryptodev/cryptodev_trace.h | 8 +-
> lib/cryptodev/rte_crypto.h | 8 +-
> lib/cryptodev/rte_crypto_asym.h | 8 -
> lib/cryptodev/rte_crypto_sym.h | 8 +-
> lib/cryptodev/rte_cryptodev.h | 8 +-
> lib/cryptodev/rte_cryptodev_trace_fp.h | 4 +-
> lib/dispatcher/rte_dispatcher.h | 8 +-
> lib/dmadev/rte_dmadev.h | 8 +
> lib/eal/arm/include/rte_atomic_32.h | 4 +-
> lib/eal/arm/include/rte_atomic_64.h | 8 +-
> lib/eal/arm/include/rte_byteorder.h | 8 +-
> lib/eal/arm/include/rte_cpuflags_32.h | 8 -
> lib/eal/arm/include/rte_cpuflags_64.h | 8 -
> lib/eal/arm/include/rte_cycles_32.h | 4 +-
> lib/eal/arm/include/rte_cycles_64.h | 4 +-
> lib/eal/arm/include/rte_io.h | 8 -
> lib/eal/arm/include/rte_io_64.h | 8 +-
> lib/eal/arm/include/rte_memcpy_32.h | 8 +-
> lib/eal/arm/include/rte_memcpy_64.h | 23 +-
> lib/eal/arm/include/rte_pause.h | 8 -
> lib/eal/arm/include/rte_pause_32.h | 6 +-
> lib/eal/arm/include/rte_pause_64.h | 8 +-
> lib/eal/arm/include/rte_power_intrinsics.h | 8 -
> lib/eal/arm/include/rte_prefetch_32.h | 8 +-
> lib/eal/arm/include/rte_prefetch_64.h | 8 +-
> lib/eal/arm/include/rte_rwlock.h | 4 +-
> lib/eal/arm/include/rte_spinlock.h | 6 +-
> lib/eal/freebsd/include/rte_os.h | 8 -
> lib/eal/include/bus_driver.h | 8 +-
> lib/eal/include/dev_driver.h | 8 -
> lib/eal/include/eal_trace_internal.h | 8 +-
> lib/eal/include/generic/rte_atomic.h | 8 +
> lib/eal/include/generic/rte_byteorder.h | 8 +
> lib/eal/include/generic/rte_cpuflags.h | 8 +
> lib/eal/include/generic/rte_cycles.h | 8 +
> lib/eal/include/generic/rte_io.h | 8 +
> lib/eal/include/generic/rte_memcpy.h | 8 +
> lib/eal/include/generic/rte_pause.h | 8 +
> .../include/generic/rte_power_intrinsics.h | 8 +
> lib/eal/include/generic/rte_prefetch.h | 8 +
> lib/eal/include/generic/rte_rwlock.h | 8 +-
> lib/eal/include/generic/rte_spinlock.h | 8 +
> lib/eal/include/generic/rte_vect.h | 8 +
> lib/eal/include/rte_alarm.h | 4 +-
> lib/eal/include/rte_bitmap.h | 8 +-
> lib/eal/include/rte_bitops.h | 768 +++++++++++++++++-
> lib/eal/include/rte_branch_prediction.h | 8 -
> lib/eal/include/rte_bus.h | 8 +-
> lib/eal/include/rte_class.h | 4 +-
> lib/eal/include/rte_common.h | 8 +-
> lib/eal/include/rte_compat.h | 8 -
> lib/eal/include/rte_dev.h | 8 +-
> lib/eal/include/rte_devargs.h | 8 +-
> lib/eal/include/rte_eal_trace.h | 4 +-
> lib/eal/include/rte_errno.h | 4 +-
> lib/eal/include/rte_fbarray.h | 8 +-
> lib/eal/include/rte_keepalive.h | 6 +-
> lib/eal/include/rte_mcslock.h | 8 +-
> lib/eal/include/rte_memory.h | 8 +-
> lib/eal/include/rte_pci_dev_feature_defs.h | 8 -
> lib/eal/include/rte_pci_dev_features.h | 8 -
> lib/eal/include/rte_per_lcore.h | 8 -
> lib/eal/include/rte_pflock.h | 8 +-
> lib/eal/include/rte_random.h | 4 +-
> lib/eal/include/rte_seqcount.h | 8 +-
> lib/eal/include/rte_seqlock.h | 8 +-
> lib/eal/include/rte_service.h | 8 +-
> lib/eal/include/rte_service_component.h | 4 +-
> lib/eal/include/rte_stdatomic.h | 5 +-
> lib/eal/include/rte_string_fns.h | 17 +-
> lib/eal/include/rte_tailq.h | 6 +-
> lib/eal/include/rte_ticketlock.h | 8 +-
> lib/eal/include/rte_time.h | 6 +-
> lib/eal/include/rte_trace.h | 8 +-
> lib/eal/include/rte_trace_point.h | 8 +-
> lib/eal/include/rte_trace_point_register.h | 8 +-
> lib/eal/include/rte_uuid.h | 8 +-
> lib/eal/include/rte_version.h | 6 +-
> lib/eal/include/rte_vfio.h | 8 +-
> lib/eal/linux/include/rte_os.h | 8 -
> lib/eal/loongarch/include/rte_atomic.h | 6 +-
> lib/eal/loongarch/include/rte_byteorder.h | 4 +-
> lib/eal/loongarch/include/rte_cpuflags.h | 8 -
> lib/eal/loongarch/include/rte_cycles.h | 4 +-
> lib/eal/loongarch/include/rte_io.h | 8 -
> lib/eal/loongarch/include/rte_memcpy.h | 4 +-
> lib/eal/loongarch/include/rte_pause.h | 8 +-
> .../loongarch/include/rte_power_intrinsics.h | 8 -
> lib/eal/loongarch/include/rte_prefetch.h | 8 +-
> lib/eal/loongarch/include/rte_rwlock.h | 4 +-
> lib/eal/loongarch/include/rte_spinlock.h | 6 +-
> lib/eal/ppc/include/rte_atomic.h | 6 +-
> lib/eal/ppc/include/rte_byteorder.h | 6 +-
> lib/eal/ppc/include/rte_cpuflags.h | 8 -
> lib/eal/ppc/include/rte_cycles.h | 8 +-
> lib/eal/ppc/include/rte_io.h | 8 -
> lib/eal/ppc/include/rte_memcpy.h | 4 +-
> lib/eal/ppc/include/rte_pause.h | 8 +-
> lib/eal/ppc/include/rte_power_intrinsics.h | 8 -
> lib/eal/ppc/include/rte_prefetch.h | 8 +-
> lib/eal/ppc/include/rte_rwlock.h | 4 +-
> lib/eal/ppc/include/rte_spinlock.h | 8 +-
> lib/eal/riscv/include/rte_atomic.h | 8 +-
> lib/eal/riscv/include/rte_byteorder.h | 8 +-
> lib/eal/riscv/include/rte_cpuflags.h | 8 -
> lib/eal/riscv/include/rte_cycles.h | 4 +-
> lib/eal/riscv/include/rte_io.h | 8 -
> lib/eal/riscv/include/rte_memcpy.h | 4 +-
> lib/eal/riscv/include/rte_pause.h | 8 +-
> lib/eal/riscv/include/rte_power_intrinsics.h | 8 -
> lib/eal/riscv/include/rte_prefetch.h | 8 +-
> lib/eal/riscv/include/rte_rwlock.h | 4 +-
> lib/eal/riscv/include/rte_spinlock.h | 6 +-
> lib/eal/windows/include/pthread.h | 6 +-
> lib/eal/windows/include/regex.h | 8 +-
> lib/eal/windows/include/rte_os.h | 8 -
> lib/eal/windows/include/rte_windows.h | 8 -
> lib/eal/x86/include/rte_atomic.h | 25 +-
> lib/eal/x86/include/rte_byteorder.h | 16 +-
> lib/eal/x86/include/rte_cpuflags.h | 8 -
> lib/eal/x86/include/rte_cycles.h | 8 +-
> lib/eal/x86/include/rte_io.h | 8 +-
> lib/eal/x86/include/rte_pause.h | 7 +-
> lib/eal/x86/include/rte_power_intrinsics.h | 8 -
> lib/eal/x86/include/rte_prefetch.h | 8 +-
> lib/eal/x86/include/rte_rwlock.h | 6 +-
> lib/eal/x86/include/rte_spinlock.h | 9 +-
> lib/ethdev/ethdev_driver.h | 8 +-
> lib/ethdev/ethdev_pci.h | 8 +-
> lib/ethdev/ethdev_trace.h | 8 +-
> lib/ethdev/ethdev_vdev.h | 8 +-
> lib/ethdev/rte_cman.h | 8 -
> lib/ethdev/rte_dev_info.h | 8 -
> lib/ethdev/rte_eth_ctrl.h | 8 -
> lib/ethdev/rte_ethdev.h | 8 +-
> lib/ethdev/rte_ethdev_trace_fp.h | 4 +-
> lib/eventdev/event_timer_adapter_pmd.h | 8 -
> lib/eventdev/eventdev_pmd.h | 8 +-
> lib/eventdev/eventdev_pmd_pci.h | 8 +-
> lib/eventdev/eventdev_pmd_vdev.h | 8 +-
> lib/eventdev/eventdev_trace.h | 8 +-
> lib/eventdev/rte_event_crypto_adapter.h | 8 +-
> lib/eventdev/rte_event_eth_rx_adapter.h | 8 +-
> lib/eventdev/rte_event_eth_tx_adapter.h | 8 +-
> lib/eventdev/rte_event_ring.h | 8 +-
> lib/eventdev/rte_event_timer_adapter.h | 8 +-
> lib/eventdev/rte_eventdev.h | 8 +-
> lib/eventdev/rte_eventdev_trace_fp.h | 4 +-
> lib/graph/rte_graph_model_mcore_dispatch.h | 8 +-
> lib/graph/rte_graph_worker.h | 6 +-
> lib/gso/rte_gso.h | 6 +-
> lib/hash/rte_fbk_hash.h | 8 +-
> lib/hash/rte_hash_crc.h | 8 +-
> lib/hash/rte_jhash.h | 8 +-
> lib/hash/rte_thash.h | 8 +-
> lib/hash/rte_thash_gfni.h | 8 +-
> lib/ip_frag/rte_ip_frag.h | 8 +-
> lib/ipsec/rte_ipsec.h | 8 +-
> lib/log/rte_log.h | 8 +-
> lib/lpm/rte_lpm.h | 8 +-
> lib/member/rte_member.h | 8 +-
> lib/member/rte_member_sketch.h | 6 +-
> lib/member/rte_member_sketch_avx512.h | 8 +-
> lib/member/rte_member_x86.h | 4 +-
> lib/member/rte_xxh64_avx512.h | 6 +-
> lib/mempool/mempool_trace.h | 8 +-
> lib/mempool/rte_mempool_trace_fp.h | 4 +-
> lib/meter/rte_meter.h | 8 +-
> lib/mldev/mldev_utils.h | 8 +-
> lib/mldev/rte_mldev_core.h | 8 -
> lib/mldev/rte_mldev_pmd.h | 8 +-
> lib/net/rte_dtls.h | 8 -
> lib/net/rte_ecpri.h | 8 -
> lib/net/rte_esp.h | 8 -
> lib/net/rte_ether.h | 8 +-
> lib/net/rte_geneve.h | 8 -
> lib/net/rte_gre.h | 8 -
> lib/net/rte_gtp.h | 8 -
> lib/net/rte_higig.h | 8 -
> lib/net/rte_ib.h | 8 -
> lib/net/rte_icmp.h | 8 -
> lib/net/rte_l2tpv2.h | 8 -
> lib/net/rte_macsec.h | 8 -
> lib/net/rte_mpls.h | 8 -
> lib/net/rte_net.h | 8 +-
> lib/net/rte_pdcp_hdr.h | 8 -
> lib/net/rte_ppp.h | 8 -
> lib/net/rte_sctp.h | 8 -
> lib/net/rte_tcp.h | 8 -
> lib/net/rte_tls.h | 8 -
> lib/net/rte_udp.h | 8 -
> lib/net/rte_vxlan.h | 10 -
> lib/node/rte_node_eth_api.h | 8 +-
> lib/node/rte_node_ip4_api.h | 8 +-
> lib/node/rte_node_ip6_api.h | 6 +-
> lib/node/rte_node_udp4_input_api.h | 8 +-
> lib/pci/rte_pci.h | 8 +-
> lib/pdcp/rte_pdcp.h | 8 +-
> lib/pipeline/rte_pipeline.h | 8 +-
> lib/pipeline/rte_port_in_action.h | 8 +-
> lib/pipeline/rte_swx_ctl.h | 8 +-
> lib/pipeline/rte_swx_extern.h | 8 -
> lib/pipeline/rte_swx_ipsec.h | 8 +-
> lib/pipeline/rte_swx_pipeline.h | 8 +-
> lib/pipeline/rte_swx_pipeline_spec.h | 8 +-
> lib/pipeline/rte_table_action.h | 8 +-
> lib/port/rte_port.h | 8 -
> lib/port/rte_port_ethdev.h | 8 +-
> lib/port/rte_port_eventdev.h | 8 +-
> lib/port/rte_port_fd.h | 8 +-
> lib/port/rte_port_frag.h | 8 +-
> lib/port/rte_port_ras.h | 8 +-
> lib/port/rte_port_ring.h | 8 +-
> lib/port/rte_port_sched.h | 8 +-
> lib/port/rte_port_source_sink.h | 8 +-
> lib/port/rte_port_sym_crypto.h | 8 +-
> lib/port/rte_swx_port.h | 8 -
> lib/port/rte_swx_port_ethdev.h | 8 +-
> lib/port/rte_swx_port_fd.h | 8 +-
> lib/port/rte_swx_port_ring.h | 8 +-
> lib/port/rte_swx_port_source_sink.h | 8 +-
> lib/rawdev/rte_rawdev.h | 6 +-
> lib/rawdev/rte_rawdev_pmd.h | 8 +-
> lib/rcu/rte_rcu_qsbr.h | 8 +-
> lib/regexdev/rte_regexdev.h | 8 +-
> lib/ring/rte_ring.h | 6 +-
> lib/ring/rte_ring_core.h | 8 -
> lib/ring/rte_ring_elem.h | 8 +-
> lib/ring/rte_ring_hts.h | 4 +-
> lib/ring/rte_ring_peek.h | 4 +-
> lib/ring/rte_ring_peek_zc.h | 4 +-
> lib/ring/rte_ring_rts.h | 4 +-
> lib/sched/rte_approx.h | 8 +-
> lib/sched/rte_pie.h | 8 +-
> lib/sched/rte_red.h | 8 +-
> lib/sched/rte_sched.h | 8 +-
> lib/sched/rte_sched_common.h | 6 +-
> lib/security/rte_security.h | 8 +-
> lib/security/rte_security_driver.h | 6 +-
> lib/stack/rte_stack.h | 8 +-
> lib/table/rte_lru.h | 8 -
> lib/table/rte_lru_arm64.h | 8 +-
> lib/table/rte_lru_x86.h | 8 -
> lib/table/rte_swx_hash_func.h | 8 -
> lib/table/rte_swx_keycmp.h | 8 +-
> lib/table/rte_swx_table.h | 8 -
> lib/table/rte_swx_table_em.h | 8 +-
> lib/table/rte_swx_table_learner.h | 8 +-
> lib/table/rte_swx_table_selector.h | 8 +-
> lib/table/rte_swx_table_wm.h | 8 +-
> lib/table/rte_table.h | 8 -
> lib/table/rte_table_acl.h | 8 +-
> lib/table/rte_table_array.h | 8 +-
> lib/table/rte_table_hash.h | 8 +-
> lib/table/rte_table_hash_cuckoo.h | 8 +-
> lib/table/rte_table_hash_func.h | 24 +-
> lib/table/rte_table_lpm.h | 8 +-
> lib/table/rte_table_lpm_ipv6.h | 8 +-
> lib/table/rte_table_stub.h | 8 +-
> lib/telemetry/rte_telemetry.h | 8 +-
> lib/vhost/rte_vdpa.h | 8 +-
> lib/vhost/rte_vhost.h | 8 +-
> lib/vhost/rte_vhost_async.h | 8 +-
> lib/vhost/rte_vhost_crypto.h | 4 +-
> lib/vhost/vdpa_driver.h | 8 +-
> 311 files changed, 2257 insertions(+), 1362 deletions(-)
> create mode 100755 buildtools/chkincs/chkextern.py
There are still unresolved comments on the first patch of the series.
However, I preferred to postpone this subject so that we can get the
headers cleanup and the new bitops API in rc1.
I skipped this first patch and dropped the check added by 1ee492bdc4ff
("buildtools/chkincs: check missing C++ guards").
Thanks for the big cleanup on DPDK headers, let's finish the work on
the headers check in rc2.
Series applied.
--
David Marchand
More information about the dev
mailing list