[PATCH v4 1/2] ring: make soring to always finalize its own stage

Morten Brørup mb at smartsharesystems.com
Tue Apr 28 13:54:54 CEST 2026


> From: Konstantin Ananyev [mailto:konstantin.ananyev at huawei.com]
> Sent: Thursday, 23 April 2026 11.16
> 
> SORING internal finalize() function is MT-safe and can be called from
> multiple places: from it's own stage release(), also from 'acquire()'
> for next stage or even from consumer's 'dequeue().
> But calling finalize() from not its own stage release() function
> creates extra contention and might slow-down ring operations,
> especially
> for the cases when we have multiple threads doing acquire/release
> for the same stage.
> We can't compeletely avoid calling finalize() from all these multiple
> places, as it can in some rare cases break soring behavior.
> But we can make release() for given stage to invoke it always.
> That increases number of 'finalize()' operations done from 'release()'
> for current stage, and helps to minimize number of finalize() calls
> from
> other stages, which in turn, help to reduce the contention.
> According to the soring_stress_autotest, for multiple workers (8+)
> it reduces number of cycles spent by 1.5x-1.8x factor.
> For l3fwd-like workload it improves things by ~20%.
> For small number of workers, I didn't observe any serious change.
> Note that it doesn't introduce any changes in functionality provided.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at huawei.com>

Good idea.
Not peeking into the tail to determine if finalize() might be omitted also makes release() cleaner.

Removing an optimization that was not really an optimization. :-)

Acked-by: Morten Brørup <mb at smartsharesystems.com>


> ---
>  lib/ring/soring.c | 33 +++++++++++++++------------------
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/ring/soring.c b/lib/ring/soring.c
> index 3b90521bdb..4bc2321fb5 100644
> --- a/lib/ring/soring.c
> +++ b/lib/ring/soring.c
> @@ -37,24 +37,24 @@
>   * plus current stage index).
>   * 'release()' extracts old head value from provided ftoken and checks
> that
>   * corresponding 'state[]' contains expected values(mostly for sanity
> - * purposes).
> - * Then it marks this state[] with 'SORING_ST_FINISH' flag to indicate
> - * that given subset of objects was released.
> - * After that, it checks does old head value equals to current tail
> value?
> - * If yes, then it performs  'finalize()' operation, otherwise
> 'release()'
> - * just returns (without spinning on stage tail value).
> - * As updated state[] is shared by all threads, some other thread can
> do
> - * 'finalize()' for given stage.
> - * That allows 'release()' to avoid excessive waits on the tail value.
> + * purposes). Then it marks this state[] with 'SORING_ST_FINISH' flag
> to
> + * indicate that given subset of objects was released.
> + * After that, it calls 'finalize()'.
>   * Main purpose of 'finalize()' operation is to walk through 'state[]'
>   * from current stage tail up to its head, check state[] and move
> stage tail
>   * through elements that already are in SORING_ST_FINISH state.
>   * Along with that, corresponding state[] values are reset to zero.
> - * Note that 'finalize()' for given stage can be done from multiple
> places:
> + * Note that updated state[] is shared by all threads, so
> + * 'finalize()' for given stage can be done from multiple places:
>   * 'release()' for that stage or from 'acquire()' for next stage
>   * even from consumer's 'dequeue()' - in case given stage is the last
> one.
>   * So 'finalize()' has to be MT-safe and inside it we have to
> - * guarantee that only one thread will update state[] and stage's tail
> values.
> + * guarantee that only one thread at a time will update state[] and
> + * stage's tail values (sort of critical-section).
> + * When multiple threads trying to do finalize() for the same stage,
> + * simultaneously one thread will win the race and do all the pending
> + * updates, while others will simply return (kind of try-lock
> scenario).
> + * That allows 'release()' to avoid excessive waits on the tail value.
>   */
> 
>  #include "soring.h"
> @@ -442,7 +442,7 @@ static __rte_always_inline void
>  soring_release(struct rte_soring *r, const void *objs,
>  	const void *meta, uint32_t stage, uint32_t n, uint32_t ftoken)
>  {
> -	uint32_t idx, pos, tail;
> +	uint32_t idx, pos;
>  	struct soring_stage *stg;
>  	union soring_state st;
> 
> @@ -479,12 +479,9 @@ soring_release(struct rte_soring *r, const void
> *objs,
>  	rte_atomic_store_explicit(&r->state[idx].raw, st.raw,
>  			rte_memory_order_relaxed);
> 
> -	/* try to do finalize(), if appropriate */
> -	tail = rte_atomic_load_explicit(&stg->sht.tail.pos,
> -			rte_memory_order_relaxed);
> -	if (tail == pos)
> -		__rte_soring_stage_finalize(&stg->sht, stage, r->state, r-
> >mask,
> -				r->capacity);
> +	/* now, try to do finalize() */
> +	__rte_soring_stage_finalize(&stg->sht, stage, r->state, r->mask,
> +			r->capacity);
>  }
> 
>  /*
> --
> 2.51.0



More information about the dev mailing list