|SUCCESS| [v3,27/27] eal: mark rte_atomicNN as deprecated

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Sat May 23 23:48:21 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164331

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-23

# Code Review: DPDK Atomic Operations Modernization

This is a large patch series converting deprecated `rte_atomic*` APIs to C11 stdatomic. I'll focus on the most impactful correctness, style, and API issues.

---

## Patch 1/27: eal: use intrinsics for rte_atomic on all platforms

**git sha:** (from series header)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** eal: use intrinsics for rte_atomic on all platforms

This commit removes inline assembly implementations of atomic operations in favor of compiler intrinsics across all architectures.

```diff
--- a/drivers/bus/dpaa/base/qbman/qman.c
+++ b/drivers/bus/dpaa/base/qbman/qman.c
@@ -11,6 +11,7 @@
 #include <rte_eventdev.h>
 #include <rte_byteorder.h>
 #include <rte_dpaa_logs.h>
+#include <rte_stdatomic.h>

[ ... ]

--- a/lib/eal/include/generic/rte_atomic.h
+++ b/lib/eal/include/generic/rte_atomic.h
@@ -187,13 +187,11 @@ static inline void rte_atomic_thread_fence(rte_memory_order memorder);
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src);

-#ifdef RTE_FORCE_INTRINSICS
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
 {
 	return __sync_bool_compare_and_swap(dst, exp, src);
 }
-#endif
```

Does this removal of the RTE_FORCE_INTRINSICS guard break compatibility with any toolchain configurations that previously relied on assembly fallbacks?

```diff
@@ -472,17 +456,11 @@ static inline void rte_atomic16_clear(rte_atomic16_t *v)
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src);

-#ifdef RTE_FORCE_INTRINSICS
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
 {
 	return __sync_bool_compare_and_swap(dst, exp, src);
 }
-#endif
```

Does the unconditional use of __sync builtins cause issues on toolchains where these were previously opt-in?

---

## Patch 2/27: eal: reimplement rte_smp_*mb with rte_atomic_thread_fence

**git sha:** (from series)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** eal: reimplement rte_smp_*mb with rte_atomic_thread_fence

```diff
--- a/lib/eal/include/generic/rte_atomic.h
+++ b/lib/eal/include/generic/rte_atomic.h
@@ -164,6 +134,24 @@ static inline void rte_atomic_thread_fence(rte_memory_order memorder);
  */
 static inline void rte_atomic_thread_fence(rte_memory_order memorder);

+static __rte_always_inline void
+rte_smp_mb(void)
+{
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
+}
+
+static __rte_always_inline void
+rte_smp_wmb(void)
+{
+	rte_atomic_thread_fence(rte_memory_order_release);
+}
+
+static __rte_always_inline void
+rte_smp_rmb(void)
+{
+	rte_atomic_thread_fence(rte_memory_order_acquire);
+}
```

The commit message claims "Generated code is unchanged on x86" but on ARM the message notes that release/acquire now emit "dmb ish" instead of "dmb ishst/ishld". Does this weaker barrier ordering on ARM preserve correctness in all existing use sites of rte_smp_rmb/wmb?

---

## Patch 3/27: ring: use compare-and-swap wrapper

**git sha:** (from series)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** ring: use compare-and-swap wrapper

```diff
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -18,6 +18,30 @@
  * For more information please refer to <rte_ring.h>.
  */

+/**
+ * @internal optimized version of compare exchange
+ *
+ * The C11 builtin's failure-writeback semantic generates worse code on x86.
+ * Unlike rte_atomic_compare_exchange_*_explicit(), this wrapper does NOT
+ * write the actual value back to a pointer on failure. Callers in a retry
+ * loop must reload the expected value explicitly on the next iteration.
+ *
+ * Full memory barrier, equivalent to rte_memory_order_seq_cst on both
+ * success and failure.
+ */
+static __rte_always_inline bool
+__rte_ring_compare_and_swap(volatile uint32_t *dst,
+			    uint32_t expected, uint32_t desired)
+{
+#if defined(RTE_TOOLCHAIN_MSVC)
+	return _InterlockedCompareExchange((volatile long *)dst,
+					   (long)desired, (long)expected)
+		== (long)expected;
+#else
+	return __sync_bool_compare_and_swap(dst, expected, desired);
+#endif
+}
```

Does calling __sync_bool_compare_and_swap directly bypass the deprecation of rte_atomic APIs? Should this instead use __atomic_compare_exchange_n with seq_cst ordering?

---

## Patch 17/27: bus/vmbus: convert from rte_atomic to stdatomic

**git sha:** (from series)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** bus/vmbus: convert from rte_atomic to stdatomic

```diff
--- a/drivers/bus/vmbus/vmbus_bufring.c
+++ b/drivers/bus/vmbus/vmbus_bufring.c
@@ -157,12 +159,15 @@ vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
 	/* The region reserved should match region used */
 	RTE_ASSERT(windex == next_windex);

-	/* Ensure that data is available before updating host index */
-	rte_smp_wmb();
+	/* Wait for previous producer to publish their windex update */
+	rte_wait_until_equal_32(&vbr->windex, old_windex, rte_memory_order_relaxed);

-	/* Checkin for our reservation. wait for our turn to update host */
-	while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
-		rte_pause();
+	/* Publish our windex update; prior data writes ordered via release.
+	 * windex is 4-byte aligned in practice (struct is page-aligned, windex
+	 * at offset 0); cast launders the packed-struct alignment-1 attribute.
+	 */
+	rte_atomic_store_explicit((volatile __rte_atomic uint32_t *)(uintptr_t)&vbr->windex,
+				  next_windex, rte_memory_order_release);
```

Does this code correctly synchronize? The old code used rte_smp_wmb() before a seq_cst CAS-spin, but the new code replaces this with a relaxed wait followed by a release store. Can the relaxed wait see stale data on weakly-ordered architectures?

---

## Patch 23/27: net/txgbe: replace rte_atomic32 with stdatomic

**git sha:** (from series)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** net/txgbe: replace rte_atomic32 with stdatomic

```diff
--- a/drivers/net/txgbe/base/txgbe_mng.c
+++ b/drivers/net/txgbe/base/txgbe_mng.c
@@ -185,7 +185,7 @@ txgbe_host_interface_command_aml(struct txgbe_hw *hw, u32 *buffer,
 	}

 	/* try to get lock */
-	while (rte_atomic32_test_and_set(&hw->swfw_busy)) {
+	while (rte_atomic_exchange_explicit(&hw->swfw_busy, true, rte_memory_order_acquire)) {
```

The commit message notes "the code for the AML spinlock had a bug because old rte_atomic32_test_set return value was not what the code expected." Does this patch correctly invert the logic? The old code looped while test_and_set succeeded (returned 1), the new code loops while exchange returns true. Is the new logic correct?

---

## Patch 24/27: net/vhost: use stdatomic instead of rte_atomic32

**git sha:** (from series)  
**Author:** Stephen Hemminger <stephen at networkplumber.org>  
**Subject:** net/vhost: use stdatomic instead of rte_atomic32

```diff
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -406,12 +406,19 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	uint16_t i, nb_rx = 0;
 	uint16_t nb_receive = nb_bufs;

-	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+	/* Fast-path early exit; racy load is fine here -- if we miss a
+	 * transition we get caught by the seq_cst check below.
+	 */
+	if (unlikely(rte_atomic_load_explicit(&r->allow_queuing, rte_memory_order_relaxed) == 0))
 		return 0;

-	rte_atomic32_set(&r->while_queuing, 1);
-
-	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+	/* Announce presence, then re-check. The store and the following
+	 * load MUST both be seq_cst so they are totally ordered with the
+	 * control thread's store-to-allow_queuing / load-of-while_queuing
+	 * pair. Anything weaker permits both sides to miss each other.
+	 */
+	rte_atomic_store_explicit(&r->while_queuing, 1, rte_memory_order_seq_cst);
+	if (unlikely(rte_atomic_load_explicit(&r->allow_queuing, rte_memory_order_seq_cst) == 0))
 		goto out;
```

Does the seq_cst ordering here correctly implement the intended Dekker-style handshake on all architectures? The comment claims "anything weaker permits both sides to miss each other" - is this accurate given that one side uses a plain volatile load in the fast path?

---

## General Style Issues

Throughout the series, several patches have inconsistent formatting:

1. Some atomic operations retain unnecessary parentheses around single-argument macro parameters
2. Indentation occasionally exceeds 79 characters in new code
3. Some casts to `(volatile __rte_atomic uint32_t *)` could be simplified

Would reformatting these areas improve consistency with DPDK style guidelines?

---


More information about the test-report mailing list