|SUCCESS| [RFC, 7/7] config: use RTE_FORCE_INTRINSICS on all platforms
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Thu May 21 14:04:32 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164223
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-21
commit bcd2f3e0a8d1234f5e6789abcdef12345678
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 2/7] eal: reimplement rte_smp_*mb with rte_atomic_thread_fence
The rte_smp_mb(), rte_smp_wmb() and rte_smp_rmb() functions were
flagged as deprecated by commit 3ec965b6de12 ("doc: update atomic
operation deprecation") in 2021 but nothing came of it.
Reimplement them as inline wrappers over rte_atomic_thread_fence()
and drop the deprecation notice. The API is preserved; only the
implementation changes.
> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> index 0a4f3f8528..4e9d230f85 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h
[ ... ]
> @@ -164,6 +103,51 @@ static inline void rte_io_rmb(void);
> */
> static inline void rte_atomic_thread_fence(rte_memory_order memorder);
>
> +
> +/** @name SMP Memory Barrier
> + */
> +///@{
> +/**
> + * General memory barrier between lcores
> + *
> + * Guarantees that the LOAD and STORE operations that precede the
> + * rte_smp_mb() call are globally visible across the lcores
> + * before the LOAD and STORE operations that follows it.
> + */
> +static __rte_always_inline void
> +rte_smp_mb(void)
> +{
> + rte_atomic_thread_fence(rte_memory_order_seq_cst);
> +}
> +
> +/**
> + * Write memory barrier between lcores
> + *
> + * Guarantees that the STORE operations that precede the
> + * rte_smp_wmb() call are globally visible across the lcores
> + * before the STORE operations that follows it.
> + */
> +static __rte_always_inline void
> +rte_smp_wmb(void)
> +{
> + rte_atomic_thread_fence(rte_memory_order_release);
> +}
> +
> +/**
> + * Read memory barrier between lcores
> + *
> + * Guarantees that the LOAD operations that precede the
> + * rte_smp_rmb() call are globally visible across the lcores
> + * before the LOAD operations that follows it.
> + */
> +static __rte_always_inline void
> +rte_smp_rmb(void)
> +{
> + rte_atomic_thread_fence(rte_memory_order_acquire);
> +}
Are the memory ordering semantics equivalent? For rte_smp_wmb(), does
release ordering provide the same guarantees as the old dmb ishst on
ARM64, or does the different fence strength alter synchronization in
subtle ways? The commit notes that arm64 now emits dmb ish instead of
dmb ishst/ishld. Have you verified that all existing call sites remain
correct under the changed memory ordering?
> diff --git a/lib/eal/x86/include/rte_atomic.h b/lib/eal/x86/include/rte_atomic.h
> index e071e4234e..a850b0257c 100644
> --- a/lib/eal/x86/include/rte_atomic.h
> +++ b/lib/eal/x86/include/rte_atomic.h
[ ... ]
> @@ -93,10 +75,19 @@ rte_smp_mb(void)
> static __rte_always_inline void
> rte_atomic_thread_fence(rte_memory_order memorder)
> {
> - if (memorder == rte_memory_order_seq_cst)
> - rte_smp_mb();
> - else
> + if (memorder == rte_memory_order_seq_cst) {
> +#ifdef RTE_TOOLCHAIN_MSVC
> + _mm_mfence();
> +#else
> +#ifdef RTE_ARCH_I686
> + asm volatile("lock addl $0, -128(%%esp); " ::: "memory");
> +#else
> + asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
> +#endif
> +#endif
> + } else {
> __rte_atomic_thread_fence(memorder);
> + }
> }
Does duplicating the x86 inline assembly into rte_atomic_thread_fence()
risk divergence from the compiler intrinsics if the toolchain evolves?
Would it be clearer to keep the lock-addl trick behind a single wrapper
macro that both paths call?
commit a3b4c5d6e7f8901234abcdef56789012345678
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 3/7] ring: use C11 atomic operations for MP/SP head/tail
Last caller of rte_atomic32_cmpset() in lib/, blocking deprecation
of the rte_atomicNN_*() family.
Replace cmpset with rte_atomic_compare_exchange_weak_explicit(),
and convert head/tail loads/stores from implicit seq_cst to explicit
acquire/release. Matches the HTS/RTS pattern.
> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index affd2d5ba7..9497f6737b 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h
> @@ -23,21 +23,25 @@
> */
> static __rte_always_inline void
> __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
> - uint32_t new_val, uint32_t single, uint32_t enqueue)
> + uint32_t new_val, uint32_t single,
> + uint32_t enqueue __rte_unused)
> {
> - if (enqueue)
> - rte_smp_wmb();
> - else
> - rte_smp_rmb();
> /*
> * If there are other enqueues/dequeues in progress that preceded us,
> * we need to wait for them to complete
> */
> if (!single)
> - rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> - rte_memory_order_relaxed);
> + rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail,
> + old_val, rte_memory_order_relaxed);
>
> - ht->tail = new_val;
> + /*
> + * Release ordering on the tail store ensures that the slot reads
> + * (dequeue) or writes (enqueue) performed by this thread are visible
> + * to the other side before the new tail value is observed.
> + * Pairs with the acquire load of the counterpart's tail in
> + * __rte_ring_headtail_move_head().
> + */
> + rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> }
Does removing the enqueue/dequeue distinction and always using release
ordering match the old barrier structure? The previous code used wmb for
enqueue (store ordering) and rmb for dequeue (load ordering). Does the
uniform release barrier provide identical semantics?
> @@ -106,12 +120,24 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
>
> *new_head = *old_head + n;
> if (is_st) {
> - d->head = *new_head;
> + rte_atomic_store_explicit(&d->head, *new_head, rte_memory_order_relaxed);
> success = 1;
> - } else
> - success = rte_atomic32_cmpset(
> - (uint32_t *)(uintptr_t)&d->head,
> - *old_head, *new_head);
> + } else {
> + /*
> + * Weak CAS: the outer do-while handles spurious
> + * failures, so we avoid the strong variant's
> + * internal retry (which on arm64 wraps the LL/SC
> + * pair in a hidden inner loop).
> + *
> + * Relaxed on both success and failure: this CAS
> + * does not publish data. Slot data visibility is
> + * provided by the acquire loads above and the
> + * release store of tail in __rte_ring_update_tail().
> + */
> + success = rte_atomic_compare_exchange_weak_explicit(
> + &d->head, old_head, *new_head,
> + rte_memory_order_relaxed, rte_memory_order_relaxed);
> + }
> } while (unlikely(success == 0));
> return n;
> }
Does the weak CAS introduce a risk of livelock if spurious failures occur
repeatedly under contention? While the do-while loop handles failures, can
pathological scenarios on some hardware lead to excessive retries compared
to the strong variant?
commit f9e8a7b1c2d3e456789abcdef01234567890
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 4/7] net/zxdh: work around GCC bitfield uninit false positive
GCC's -Wmaybe-uninitialized analysis cannot follow struct
initialization through bitfield reads. Replace the struct initializer
with an explicit memset() so the full-width initialization is visible
to the analyzer.
> diff --git a/drivers/net/zxdh/zxdh_msg.c b/drivers/net/zxdh/zxdh_msg.c
> index 4b01daf37a..8f88181a3f 100644
> --- a/drivers/net/zxdh/zxdh_msg.c
> +++ b/drivers/net/zxdh/zxdh_msg.c
> @@ -728,13 +728,15 @@ zxdh_bar_chan_sync_msg_reps_get(uint64_t subchan_addr,
> int
> zxdh_bar_chan_sync_msg_send(struct zxdh_pci_bar_msg *in, struct zxdh_msg_recviver_mem *result)
> {
> - struct zxdh_bar_msg_header msg_header = {0};
> + struct zxdh_bar_msg_header msg_header;
> uint16_t seq_id = 0;
> uint64_t subchan_addr = 0;
> uint32_t time_out_cnt = 0;
> uint16_t valid = 0;
> int ret = 0;
>
> + memset(&msg_header, 0, sizeof(msg_header));
> +
> ret = zxdh_bar_chan_send_para_check(in, result);
Why does memset(&msg_header, 0, sizeof(msg_header)) suppress the warning
when the struct initializer {0} does not? Does this imply that later
code writes only selected fields of msg_header and leaves others
uninitialized?
commit d4e5f6a7b8c9d0e1234567890abcdef123456
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 5/7] net/bonding: use stdatomic
The old rte_atomic16 functions are deprecated. Replace with
rte_stdatomic for managing warning flag. Can also use fetch_or and
exchange to avoid CAS here.
Patch looks correct.
commit e1f2a3b4c5d6e789012345abcdef678901234
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 6/7] net/nbl: remove unused rte_atomic16 field
Patch looks correct.
commit ab7c8d9e0f1a23b456789abcdef0123456789
Author: Stephen Hemminger <stephen at networkplumber.org>
[RFC 7/7] config: use RTE_FORCE_INTRINSICS on all platforms
Next step is to deprecate the rte_atomicNN_*() family. Rather than
maintaining both the inline asm and intrinsic fallbacks, drop the asm
paths and use intrinsics everywhere.
Patch looks correct.
More information about the test-report
mailing list