|SUCCESS| [v3,27/27] eal: mark rte_atomicNN as deprecated
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Sat May 23 23:48:21 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164331
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-23
# Code Review: DPDK Atomic Operations Modernization
This is a large patch series converting deprecated `rte_atomic*` APIs to C11 stdatomic. I'll focus on the most impactful correctness, style, and API issues.
---
## Patch 1/27: eal: use intrinsics for rte_atomic on all platforms
**git sha:** (from series header)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** eal: use intrinsics for rte_atomic on all platforms
This commit removes inline assembly implementations of atomic operations in favor of compiler intrinsics across all architectures.
```diff
--- a/drivers/bus/dpaa/base/qbman/qman.c
+++ b/drivers/bus/dpaa/base/qbman/qman.c
@@ -11,6 +11,7 @@
#include <rte_eventdev.h>
#include <rte_byteorder.h>
#include <rte_dpaa_logs.h>
+#include <rte_stdatomic.h>
[ ... ]
--- a/lib/eal/include/generic/rte_atomic.h
+++ b/lib/eal/include/generic/rte_atomic.h
@@ -187,13 +187,11 @@ static inline void rte_atomic_thread_fence(rte_memory_order memorder);
static inline int
rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src);
-#ifdef RTE_FORCE_INTRINSICS
static inline int
rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
{
return __sync_bool_compare_and_swap(dst, exp, src);
}
-#endif
```
Does this removal of the RTE_FORCE_INTRINSICS guard break compatibility with any toolchain configurations that previously relied on assembly fallbacks?
```diff
@@ -472,17 +456,11 @@ static inline void rte_atomic16_clear(rte_atomic16_t *v)
static inline int
rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src);
-#ifdef RTE_FORCE_INTRINSICS
static inline int
rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
{
return __sync_bool_compare_and_swap(dst, exp, src);
}
-#endif
```
Does the unconditional use of __sync builtins cause issues on toolchains where these were previously opt-in?
---
## Patch 2/27: eal: reimplement rte_smp_*mb with rte_atomic_thread_fence
**git sha:** (from series)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** eal: reimplement rte_smp_*mb with rte_atomic_thread_fence
```diff
--- a/lib/eal/include/generic/rte_atomic.h
+++ b/lib/eal/include/generic/rte_atomic.h
@@ -164,6 +134,24 @@ static inline void rte_atomic_thread_fence(rte_memory_order memorder);
*/
static inline void rte_atomic_thread_fence(rte_memory_order memorder);
+static __rte_always_inline void
+rte_smp_mb(void)
+{
+ rte_atomic_thread_fence(rte_memory_order_seq_cst);
+}
+
+static __rte_always_inline void
+rte_smp_wmb(void)
+{
+ rte_atomic_thread_fence(rte_memory_order_release);
+}
+
+static __rte_always_inline void
+rte_smp_rmb(void)
+{
+ rte_atomic_thread_fence(rte_memory_order_acquire);
+}
```
The commit message claims "Generated code is unchanged on x86" but on ARM the message notes that release/acquire now emit "dmb ish" instead of "dmb ishst/ishld". Does this weaker barrier ordering on ARM preserve correctness in all existing use sites of rte_smp_rmb/wmb?
---
## Patch 3/27: ring: use compare-and-swap wrapper
**git sha:** (from series)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** ring: use compare-and-swap wrapper
```diff
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -18,6 +18,30 @@
* For more information please refer to <rte_ring.h>.
*/
+/**
+ * @internal optimized version of compare exchange
+ *
+ * The C11 builtin's failure-writeback semantic generates worse code on x86.
+ * Unlike rte_atomic_compare_exchange_*_explicit(), this wrapper does NOT
+ * write the actual value back to a pointer on failure. Callers in a retry
+ * loop must reload the expected value explicitly on the next iteration.
+ *
+ * Full memory barrier, equivalent to rte_memory_order_seq_cst on both
+ * success and failure.
+ */
+static __rte_always_inline bool
+__rte_ring_compare_and_swap(volatile uint32_t *dst,
+ uint32_t expected, uint32_t desired)
+{
+#if defined(RTE_TOOLCHAIN_MSVC)
+ return _InterlockedCompareExchange((volatile long *)dst,
+ (long)desired, (long)expected)
+ == (long)expected;
+#else
+ return __sync_bool_compare_and_swap(dst, expected, desired);
+#endif
+}
```
Does calling __sync_bool_compare_and_swap directly bypass the deprecation of rte_atomic APIs? Should this instead use __atomic_compare_exchange_n with seq_cst ordering?
---
## Patch 17/27: bus/vmbus: convert from rte_atomic to stdatomic
**git sha:** (from series)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** bus/vmbus: convert from rte_atomic to stdatomic
```diff
--- a/drivers/bus/vmbus/vmbus_bufring.c
+++ b/drivers/bus/vmbus/vmbus_bufring.c
@@ -157,12 +159,15 @@ vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
/* The region reserved should match region used */
RTE_ASSERT(windex == next_windex);
- /* Ensure that data is available before updating host index */
- rte_smp_wmb();
+ /* Wait for previous producer to publish their windex update */
+ rte_wait_until_equal_32(&vbr->windex, old_windex, rte_memory_order_relaxed);
- /* Checkin for our reservation. wait for our turn to update host */
- while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
- rte_pause();
+ /* Publish our windex update; prior data writes ordered via release.
+ * windex is 4-byte aligned in practice (struct is page-aligned, windex
+ * at offset 0); cast launders the packed-struct alignment-1 attribute.
+ */
+ rte_atomic_store_explicit((volatile __rte_atomic uint32_t *)(uintptr_t)&vbr->windex,
+ next_windex, rte_memory_order_release);
```
Does this code correctly synchronize? The old code used rte_smp_wmb() before a seq_cst CAS-spin, but the new code replaces this with a relaxed wait followed by a release store. Can the relaxed wait see stale data on weakly-ordered architectures?
---
## Patch 23/27: net/txgbe: replace rte_atomic32 with stdatomic
**git sha:** (from series)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** net/txgbe: replace rte_atomic32 with stdatomic
```diff
--- a/drivers/net/txgbe/base/txgbe_mng.c
+++ b/drivers/net/txgbe/base/txgbe_mng.c
@@ -185,7 +185,7 @@ txgbe_host_interface_command_aml(struct txgbe_hw *hw, u32 *buffer,
}
/* try to get lock */
- while (rte_atomic32_test_and_set(&hw->swfw_busy)) {
+ while (rte_atomic_exchange_explicit(&hw->swfw_busy, true, rte_memory_order_acquire)) {
```
The commit message notes "the code for the AML spinlock had a bug because old rte_atomic32_test_set return value was not what the code expected." Does this patch correctly invert the logic? The old code looped while test_and_set succeeded (returned 1), the new code loops while exchange returns true. Is the new logic correct?
---
## Patch 24/27: net/vhost: use stdatomic instead of rte_atomic32
**git sha:** (from series)
**Author:** Stephen Hemminger <stephen at networkplumber.org>
**Subject:** net/vhost: use stdatomic instead of rte_atomic32
```diff
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -406,12 +406,19 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
uint16_t i, nb_rx = 0;
uint16_t nb_receive = nb_bufs;
- if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ /* Fast-path early exit; racy load is fine here -- if we miss a
+ * transition we get caught by the seq_cst check below.
+ */
+ if (unlikely(rte_atomic_load_explicit(&r->allow_queuing, rte_memory_order_relaxed) == 0))
return 0;
- rte_atomic32_set(&r->while_queuing, 1);
-
- if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+ /* Announce presence, then re-check. The store and the following
+ * load MUST both be seq_cst so they are totally ordered with the
+ * control thread's store-to-allow_queuing / load-of-while_queuing
+ * pair. Anything weaker permits both sides to miss each other.
+ */
+ rte_atomic_store_explicit(&r->while_queuing, 1, rte_memory_order_seq_cst);
+ if (unlikely(rte_atomic_load_explicit(&r->allow_queuing, rte_memory_order_seq_cst) == 0))
goto out;
```
Does the seq_cst ordering here correctly implement the intended Dekker-style handshake on all architectures? The comment claims "anything weaker permits both sides to miss each other" - is this accurate given that one side uses a plain volatile load in the fast path?
---
## General Style Issues
Throughout the series, several patches have inconsistent formatting:
1. Some atomic operations retain unnecessary parentheses around single-argument macro parameters
2. Indentation occasionally exceeds 79 characters in new code
3. Some casts to `(volatile __rte_atomic uint32_t *)` could be simplified
Would reformatting these areas improve consistency with DPDK style guidelines?
---
More information about the test-report
mailing list