|SUCCESS| [RFC, v2, 11/11] crypto/ccp: replace use of rte_atomic64 with stdatomic

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Thu May 21 22:49:01 CEST 2026
Previous message (by thread): |WARNING| pw164230-164240 [PATCH] [RFC,v2,11/11] crypto/ccp: replace
Next message (by thread): |WARNING| pw164241 [PATCH] [v11] eal/x86: optimize memcpy of small s
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164240

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-21

I'll review this patch series focusing on correctness, coding style, API requirements, and potential bugs.

commit: series.patch
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [RFC v2 01/11] eal: use intrinsics for rte_atomic on all platforms

This commit removes inline assembly implementations of rte_atomic operations,
keeping only the intrinsics-based fallback for all platforms.

> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> index 0a4f3f8528..292e52fade 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h

[ ... ]

> @@ -209,15 +209,11 @@ rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
>   *   The original value at that location
>   */
>  static inline uint16_t
> -rte_atomic16_exchange(volatile uint16_t *dst, uint16_t val);
> -
> -#ifdef RTE_FORCE_INTRINSICS
> -static inline uint16_t
>  rte_atomic16_exchange(volatile uint16_t *dst, uint16_t val)
>  {
> -	return rte_atomic_exchange_explicit(dst, val, rte_memory_order_seq_cst);
> +	return rte_atomic_exchange_explicit((volatile __rte_atomic uint16_t *)dst,
> +		val, rte_memory_order_seq_cst);

Does this cast correctly align with the C11 atomics standard? The cast from
"volatile uint16_t *" to "volatile __rte_atomic uint16_t *" may lead to
undefined behavior per C11 7.17.6 if dst was not originally declared atomic.

> @@ -478,15 +478,11 @@ rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
>   *   The original value at that location
>   */
>  static inline uint32_t
> -rte_atomic32_exchange(volatile uint32_t *dst, uint32_t val);
> -
> -#ifdef RTE_FORCE_INTRINSICS
> -static inline uint32_t
>  rte_atomic32_exchange(volatile uint32_t *dst, uint32_t val)
>  {
> -	return rte_atomic_exchange_explicit(dst, val, rte_memory_order_seq_cst);
> +	return rte_atomic_exchange_explicit((volatile __rte_atomic uint32_t *)dst,
> +					    val, rte_memory_order_seq_cst);

Same casting concern as above for atomic correctness.

> @@ -746,15 +746,11 @@ rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
>   *   The original value at that location
>   */
>  static inline uint64_t
> -rte_atomic64_exchange(volatile uint64_t *dst, uint64_t val);
> -
> -#ifdef RTE_FORCE_INTRINSICS
> -static inline uint64_t
>  rte_atomic64_exchange(volatile uint64_t *dst, uint64_t val)
>  {
> -	return rte_atomic_exchange_explicit(dst, val, rte_memory_order_seq_cst);
> +	return rte_atomic_exchange_explicit((volatile __rte_atomic uint64_t *)dst,
> +					    val, rte_memory_order_seq_cst);

Same casting concern applies here.

---

commit: [RFC v2 02/11] eal: reimplement rte_smp_*mb with rte_atomic_thread_fence

> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> index 292e52fade..1b04b43cbb 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h

[ ... ]

> @@ -164,6 +134,24 @@ static inline void rte_io_rmb(void);
>   */
>  static inline void rte_atomic_thread_fence(rte_memory_order memorder);
>  
> +static __rte_always_inline void
> +rte_smp_mb(void)
> +{
> +	rte_atomic_thread_fence(rte_memory_order_seq_cst);
> +}
> +
> +static __rte_always_inline void
> +rte_smp_wmb(void)
> +{
> +	rte_atomic_thread_fence(rte_memory_order_release);
> +}

Does release ordering provide a full write barrier? The previous rte_smp_wmb
implementations emitted dmb ishst on ARM64. The commit message indicates dmb 
ish is now used. Can this ordering change affect store-to-store visibility 
between threads?

> +static __rte_always_inline void
> +rte_smp_rmb(void)
> +{
> +	rte_atomic_thread_fence(rte_memory_order_acquire);
> +}

Does acquire ordering prevent speculative loads? The previous rte_smp_rmb on
ARM64 emitted dmb ishld. Does the memory_order_acquire change from ishld to 
ish affect load-to-load dependencies?

---

commit: [RFC v2 03/11] ring: use C11 atomic operations for MP/SP head/tail

> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index affd2d5ba7..84570fd5fc 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h

[ ... ]

> @@ -106,12 +119,24 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
>  
>  		*new_head = *old_head + n;
>  		if (is_st) {
> -			d->head = *new_head;
> +			rte_atomic_store_explicit(&d->head, *new_head, rte_memory_order_relaxed);

Does relaxed store ordering work for single-threaded producer/consumer? If the
ring is reconfigured or accessed by diagnostics, can this relaxed store race?

>  			success = 1;
> -		} else
> -			success = rte_atomic32_cmpset(
> -					(uint32_t *)(uintptr_t)&d->head,
> -					*old_head, *new_head);
> +		} else {
> +			/*
> +			 * Weak CAS: the outer do-while handles spurious
> +			 * failures, so we avoid the strong variant's
> +			 * internal retry (which on arm64 wraps the LL/SC
> +			 * pair in a hidden inner loop).
> +			 *
> +			 * Relaxed on both success and failure: this CAS
> +			 * does not publish data. Slot data visibility is
> +			 * provided by the acquire loads above and the
> +			 * release store of tail in __rte_ring_update_tail().
> +			 */
> +			success = rte_atomic_compare_exchange_weak_explicit(
> +				&d->head, old_head, *new_head,
> +				rte_memory_order_relaxed, rte_memory_order_relaxed);
> +		}

Does the weak CAS update *old_head on failure? The do-while loop uses *old_head
but rte_atomic_compare_exchange_weak_explicit() might not update it properly.

---

commit: [RFC v2 06/11] net/ena: replace use of rte_atomicNN

> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
> index ea4afbc75d..e9c484456c 100644
> --- a/drivers/net/ena/ena_ethdev.c
> +++ b/drivers/net/ena/ena_ethdev.c

[ ... ]

> @@ -1249,10 +1243,7 @@ static void ena_stats_restart(struct rte_eth_dev *dev)
>  {
>  	struct ena_adapter *adapter = dev->data->dev_private;
>  
> -	rte_atomic64_init(&adapter->drv_stats->ierrors);
> -	rte_atomic64_init(&adapter->drv_stats->oerrors);
> -	rte_atomic64_init(&adapter->drv_stats->rx_nombuf);
> -	adapter->drv_stats->rx_drops = 0;
> +	memset(adapter->drv_stats, 0, sizeof(struct ena_driver_stats));

Does this code race with concurrent statistics updates from datapath threads?
The previous atomic operations ensured some ordering, but memset has no 
atomicity guarantees.

---

The review raises several questions about memory ordering semantics, atomic 
casting correctness, and potential data races that should be addressed before 
merging this refactoring.
Previous message (by thread): |WARNING| pw164230-164240 [PATCH] [RFC,v2,11/11] crypto/ccp: replace
Next message (by thread): |WARNING| pw164241 [PATCH] [v11] eal/x86: optimize memcpy of small s
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the test-report mailing list