[dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync

Olivier Matz olivier.matz at 6wind.com
Wed Jul 28 11:56:46 CEST 2021


Hi Joyce,

On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote:
> Convert rte_atomic usages to compiler atomic built-ins for lcores
> sync in mcslock testcases.
> 
> Signed-off-by: Joyce Kong <joyce.kong at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> Acked-by: Stephen Hemminger <stephen at networkplumber.org>
> ---
>  app/test/test_mcslock.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
> index 80eaecc90a..52e45e7e2a 100644
> --- a/app/test/test_mcslock.c
> +++ b/app/test/test_mcslock.c
> @@ -17,7 +17,6 @@
>  #include <rte_lcore.h>
>  #include <rte_cycles.h>
>  #include <rte_mcslock.h>
> -#include <rte_atomic.h>
>  
>  #include "test.h"
>  
> @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
>  
>  static unsigned int count;
>  
> -static rte_atomic32_t synchro;
> +static uint32_t synchro;
>  
>  static int
>  test_mcslock_per_core(__rte_unused void *arg)
> @@ -76,8 +75,7 @@ load_loop_fn(void *func_param)
>  	rte_mcslock_t ml_perf_me;
>  
>  	/* wait synchro */
> -	while (rte_atomic32_read(&synchro) == 0)
> -		;
> +	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
>  
>  	begin = rte_get_timer_cycles();
>  	while (lcount < MAX_LOOP) {
> @@ -102,15 +100,15 @@ test_mcslock_perf(void)
>  	const unsigned int lcore = rte_lcore_id();
>  
>  	printf("\nTest with no lock on single core...\n");
> -	rte_atomic32_set(&synchro, 1);
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	load_loop_fn(&lock);
>  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>  			lcore, time_count[lcore]);
>  	memset(time_count, 0, sizeof(time_count));
>  
>  	printf("\nTest with lock on single core...\n");
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	lock = 1;
> -	rte_atomic32_set(&synchro, 1);

nit: is there a reason for moving this line?


>  	load_loop_fn(&lock);
>  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>  			lcore, time_count[lcore]);
> @@ -118,11 +116,11 @@ test_mcslock_perf(void)
>  
>  	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
>  
> -	rte_atomic32_set(&synchro, 0);
> +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>  	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
>  
>  	/* start synchro and launch test on main */
> -	rte_atomic32_set(&synchro, 1);
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	load_loop_fn(&lock);

I have a more general question. Please forgive my ignorance about the
C++11 atomic builtins and memory model. Both gcc manual and C11 standard
are not that easy to understand :)

In all the patches of this patchset, __ATOMIC_RELAXED is used. My
understanding is that it does not add any inter-thread ordering
constraint. I suppose that in this particular case, we rely on
the call to rte_eal_mp_remote_launch() being a compiler barrier,
and the function itself to be a memory barrier. This ensures that
worker threads sees synchro=0 until it is set to 1 by the master.
Is it correct?

What is the reason for using the atomic API here? Wouldn't a standard
affectation work too? (I mean "synchro = 1;")


>  
>  	rte_eal_mp_wait_lcore();
> -- 
> 2.17.1
> 


More information about the dev mailing list