[dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync

Joyce Kong Joyce.Kong at arm.com
Thu Jul 29 09:19:13 CEST 2021


Hi Olivier,

> -----Original Message-----
> From: Olivier Matz <olivier.matz at 6wind.com>
> Sent: Wednesday, July 28, 2021 5:57 PM
> To: Joyce Kong <Joyce.Kong at arm.com>
> Cc: thomas at monjalon.net; david.marchand at redhat.com;
> roretzla at linux.microsoft.com; stephen at networkplumber.org;
> andrew.rybchenko at oktetlabs.ru; harry.van.haaren at intel.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; dev at dpdk.org; nd <nd at arm.com>
> Subject: Re: [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores
> sync
> 
> Hi Joyce,
> 
> On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote:
> > Convert rte_atomic usages to compiler atomic built-ins for lcores sync
> > in mcslock testcases.
> >
> > Signed-off-by: Joyce Kong <joyce.kong at arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > Acked-by: Stephen Hemminger <stephen at networkplumber.org>
> > ---
> >  app/test/test_mcslock.c | 14 ++++++--------
> >  1 file changed, 6 insertions(+), 8 deletions(-)
> >
> > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index
> > 80eaecc90a..52e45e7e2a 100644
> > --- a/app/test/test_mcslock.c
> > +++ b/app/test/test_mcslock.c
> > @@ -17,7 +17,6 @@
> >  #include <rte_lcore.h>
> >  #include <rte_cycles.h>
> >  #include <rte_mcslock.h>
> > -#include <rte_atomic.h>
> >
> >  #include "test.h"
> >
> > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
> >
> >  static unsigned int count;
> >
> > -static rte_atomic32_t synchro;
> > +static uint32_t synchro;
> >
> >  static int
> >  test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@
> > load_loop_fn(void *func_param)
> >  	rte_mcslock_t ml_perf_me;
> >
> >  	/* wait synchro */
> > -	while (rte_atomic32_read(&synchro) == 0)
> > -		;
> > +	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
> >
> >  	begin = rte_get_timer_cycles();
> >  	while (lcount < MAX_LOOP) {
> > @@ -102,15 +100,15 @@ test_mcslock_perf(void)
> >  	const unsigned int lcore = rte_lcore_id();
> >
> >  	printf("\nTest with no lock on single core...\n");
> > -	rte_atomic32_set(&synchro, 1);
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	load_loop_fn(&lock);
> >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> >  			lcore, time_count[lcore]);
> >  	memset(time_count, 0, sizeof(time_count));
> >
> >  	printf("\nTest with lock on single core...\n");
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	lock = 1;
> > -	rte_atomic32_set(&synchro, 1);
> 
> nit: is there a reason for moving this line?

I meant to use __atomic_store_n() instead of rte_atomic32_set() to set synchro,
but put the operation to the line up 'lock=1' by mistake, will change it.

> > 
> >  	load_loop_fn(&lock);
> >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> >  			lcore, time_count[lcore]);
> > @@ -118,11 +116,11 @@ test_mcslock_perf(void)
> >
> >  	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
> >
> > -	rte_atomic32_set(&synchro, 0);
> > +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
> >  	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
> >
> >  	/* start synchro and launch test on main */
> > -	rte_atomic32_set(&synchro, 1);
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	load_loop_fn(&lock);
> 
> I have a more general question. Please forgive my ignorance about the
> C++11 atomic builtins and memory model. Both gcc manual and C11
> standard
> are not that easy to understand :)
> 
> In all the patches of this patchset, __ATOMIC_RELAXED is used. My
> understanding is that it does not add any inter-thread ordering constraint. I
> suppose that in this particular case, we rely on the call to
> rte_eal_mp_remote_launch() being a compiler barrier, and the function itself
> to be a memory barrier. This ensures that worker threads sees synchro=0
> until it is set to 1 by the master.
> Is it correct?
> 

Yes, you are right. __ATOMIC_RELAXED would introduce no barrier, and the worker
threads would sync with master thread by 'synchro'.

> What is the reason for using the atomic API here? Wouldn't a standard
> affectation work too? (I mean "synchro = 1;")
> 

Here, __atomic_store_n(__ATOMIC_RELAXED) is used to ensure worker threads
see 'synchro=1' after it is changed by the master. And a standard affection can not
ensure worker threads get the new value.

> 
> >
> >  	rte_eal_mp_wait_lcore();
> > --
> > 2.17.1
> >


More information about the dev mailing list