|SUCCESS| [5/5] ring: use C11 for single thread move head

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Tue Jun 2 22:56:35 CEST 2026
Previous message (by thread): |SUCCESS| [v1] dts: report dut/NIC info during DTS run
Next message (by thread): |SUCCESS|dpdk-next-net|24162fa448| intel-Functional
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164824

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-06-02

I'll review this DPDK patch series focusing on correctness, coding style, API requirements, and guideline violations.

---

**Patch 1/5: ring: split single thread vs multi-thread cases**

git sha: 20260602171552.686349-2-stephen at networkplumber.org
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [PATCH 1/5] ring: split single thread vs multi-thread cases

This patch splits the move head function into separate single-threaded
and multi-threaded variants for cleaner code organization.

> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index 07b6efc416..5afc14dec9 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h

[ ... ]

> @@ -72,14 +71,77 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
>   */
>  static __rte_always_inline unsigned int
> -__rte_ring_headtail_move_head(struct rte_ring_headtail *d,
> +__rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
>  		const struct rte_ring_headtail *s, uint32_t capacity,
> -		unsigned int is_st, unsigned int n,
> +		unsigned int n,
>  		enum rte_ring_queue_behavior behavior,
>  		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
>  {
>  	uint32_t stail;
> -	int success;

Does removing this variable declaration cause an issue? The new code no
longer uses the success variable at all in the ST path.

> +	/* Single producer: only this thread writes d->head,
> +	 * so a relaxed load is sufficient.
> +	 */
> +	*old_head = rte_atomic_load_explicit(&d->head, rte_memory_order_relaxed);
> +
> +	/* Acquire pairs with the consumer's release-store of tail in __rte_ring_update_tail,
> +	 * ensuring the consumer's ring-element reads are complete before
> +	 * we observe the updated tail.
> +	 */
> +	stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);

These comment lines exceed 79 characters. Can these be wrapped to comply
with coding standards?

> +	if (n > 0) {
> +		*new_head = *old_head + n;
> +		rte_atomic_store_explicit(&d->head, *new_head, rte_memory_order_relaxed);
> +	}

Does this code need a memory order constraint stronger than relaxed for
the ST case? The original code had different behavior.

> @@ -120,25 +182,21 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_st) {
> -			d->head = *new_head;
> -			success = 1;
> -		} else
> -			/* on failure, *old_head is updated */

The removed comment about *old_head being updated on failure was
informative. Does removing this make the code less clear?

> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index affd2d5ba7..c044b0824f 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h

[ ... ]

> @@ -105,15 +103,70 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_st) {
> -			d->head = *new_head;
> -			success = 1;
> -		} else
> -			success = rte_atomic32_cmpset(
> -					(uint32_t *)(uintptr_t)&d->head,
> -					*old_head, *new_head);
> +		success = rte_atomic32_cmpset(
> +				(uint32_t *)(uintptr_t)&d->head,
> +				*old_head, *new_head);
>  	} while (unlikely(success == 0));

Does this condition need updating to match the new code pattern? In the
C11 version, you changed "success == 0" to "!success".

---

**Patch 2/5: ring: use GCC builtin as alternative to rte_atomic32**

git sha: 20260602171552.686349-3-stephen at networkplumber.org
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [PATCH 2/5] ring: use GCC builtin as alternative to rte_atomic32

This patch replaces deprecated rte_atomic32 with GCC builtin atomics for
performance reasons.

> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index 5afc14dec9..8358b0f21f 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -43,7 +43,6 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
>  	 */
>  	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
>  }
> -

Is this blank line removal intentional? It seems unrelated to the atomic
changes.

> @@ -82,7 +81,7 @@ __rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
>  	/* Single producer: only this thread writes d->head,
>  	 * so a relaxed load is sufficient.
>  	 */
> -	*old_head = rte_atomic_load_explicit(&d->head, rte_memory_order_relaxed);
> +	*old_head = rte_atomic_load_explicit(&d->head,	rte_memory_order_acquire);

Why does the memory order change from relaxed to acquire? The comment
states "relaxed load is sufficient". Does this comment need updating, or
is the code change incorrect?

Also, there's a tab character before rte_memory_order_acquire. Does this
violate whitespace standards?

> diff --git a/lib/ring/rte_ring_gcc_pvt.h b/lib/ring/rte_ring_gcc_pvt.h
> new file mode 100644
> index 0000000000..9033a15647
> --- /dev/null
> +++ b/lib/ring/rte_ring_gcc_pvt.h

[ ... ]

> @@ -37,7 +35,12 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
>  		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
>  			rte_memory_order_relaxed);
>  
> -	ht->tail = new_val;
> +	/*
> +	 * R0: Establishes a synchronizing edge with load-acquire of tail at A1.
> +	 * Ensures that memory effects by this thread on ring elements array
> +	 * is observed by a different thread of the other type.
> +	 */
> +	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);

Comment line exceeds 79 characters. Can this be wrapped?

> @@ -81,10 +84,10 @@ __rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
>  
>  		*old_head = d->head;
>  
> -		/* add rmb barrier to avoid load/load reorder in weak
> +		/* add fence to avoid load/load reorder in weak
>  		 * memory model. It is noop on x86
>  		 */
> -		rte_smp_rmb();
> +		__atomic_thread_fence(__ATOMIC_ACQUIRE);

Does __atomic_thread_fence provide the same guarantees as rte_smp_rmb
on all architectures? The commit message doesn't explain this change.

> @@ -100,13 +103,15 @@ __rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
>  					0 : *entries;
>  
>  		if (n == 0)
> -			return 0;
> +			break;

Why change from return to break? Does this affect cleanup or future
code paths?

> +		success = __sync_bool_compare_and_swap(
>  				(uint32_t *)(uintptr_t)&d->head,
>  				*old_head, *new_head);
> -	} while (unlikely(success == 0));
> +	} while (unlikely(!success));

Does this maintain the same semantics? The commit message mentions
__sync_bool_compare_and_swap returns a plain bool.

---

**Patch 3/5: ring: use C11 for update_tail**

git sha: 20260602171552.686349-4-stephen at networkplumber.org
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [PATCH 3/5] ring: use C11 for update_tail

This consolidates update_tail to use only C11 memory model.

> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index 8358b0f21f..3258829696 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -19,30 +19,6 @@
>   * For more information please refer to <rte_ring.h>.
>   */
>  
> -/**
> - * @internal This function updates tail values.
> - */
> -static __rte_always_inline void
> -__rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
> -		uint32_t new_val, uint32_t single, uint32_t enqueue)

Does removing this duplicate definition of __rte_ring_update_tail cause
any issues with the build or inlining behavior?

---

**Patch 4/5: ring: drop unused arg to update_tail**

git sha: 20260602171552.686349-5-stephen at networkplumber.org
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [PATCH 4/5] ring: drop unused arg to update_tail

This removes the unused enqueue parameter.

> diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
> index a7ff76931b..74b5fef771 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -301,10 +301,8 @@ __rte_ring_dequeue_elems(struct rte_ring *r, uint32_t cons_head,
>  
>  static __rte_always_inline void
>  __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
> -		uint32_t new_val, uint32_t single, uint32_t enqueue)
> +		       uint32_t new_val, uint32_t single)

The alignment of parameters changed. Does this match the project's
preferred indentation style?

---

**Patch 5/5: ring: use C11 for single thread move head**

git sha: 20260602171552.686349-6-stephen at networkplumber.org
Author: Stephen Hemminger <stephen at networkplumber.org>
Subject: [PATCH 5/5] ring: use C11 for single thread move head

This consolidates to use C11 for ST path.

> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index 3258829696..0ba64379fa 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -19,68 +19,6 @@
>   * For more information please refer to <rte_ring.h>.
>   */
>  
> -/**
> - * @internal This is a helper function that moves the producer/consumer head

Does removing the entire __rte_ring_headtail_move_head_st function from
this file cause duplication issues when it's added to
rte_ring_elem_pvt.h?

> diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
> index 74b5fef771..cd77343f38 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -319,12 +319,74 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
>  	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
>  }
>  
> -/* Between load and load. there might be cpu reorder in weak model
> - * (powerpc/arm).
> - * There are 2 choices for the users
> - * 1.use rmb() memory barrier
> - * 2.use one-direction load_acquire/store_release barrier
> - * It depends on performance test results.

Does removing this explanatory comment reduce code maintainability? The
comment provided context about architectural considerations.

> +	/* Acquire pairs with the consumer's release-store of tail in __rte_ring_update_tail,
> +	 * ensuring the consumer's ring-element reads are complete before
> +	 * we observe the updated tail.
> +	 */

This comment line exceeds 79 characters. Can it be wrapped?

> +/*
> + * The function __rte_ring_headtail_move_head_mt has two versions
> + * based on what is most efficient on a given architecture.
> + *
> + * The C11 is preferred but on x86 GCC has 10% performance drop.
> + */

Does this comment accurately describe the current state? Is the
performance impact documented elsewhere?
Previous message (by thread): |SUCCESS| [v1] dts: report dut/NIC info during DTS run
Next message (by thread): |SUCCESS|dpdk-next-net|24162fa448| intel-Functional
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the test-report mailing list