[dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync

Joyce Kong Joyce.Kong at arm.com
Wed Jul 28 09:07:08 CEST 2021


> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>
> Sent: Saturday, July 24, 2021 3:52 AM
> To: Joyce Kong <Joyce.Kong at arm.com>; thomas at monjalon.net;
> david.marchand at redhat.com; roretzla at linux.microsoft.com;
> stephen at networkplumber.org; olivier.matz at 6wind.com;
> harry.van.haaren at intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>
> Cc: dev at dpdk.org; nd <nd at arm.com>
> Subject: Re: [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
> 
> On 7/20/21 6:51 AM, Joyce Kong wrote:
> > Covert rte_atomic usages to compiler atomic built-ins in rcu_perf
> > testcases.
> >
> > Signed-off-by: Joyce Kong <joyce.kong at arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > Acked-by: Stephen Hemminger <stephen at networkplumber.org>
> > ---
> >   app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
> >   1 file changed, 49 insertions(+), 49 deletions(-)
> >
> > diff --git a/app/test/test_rcu_qsbr_perf.c
> > b/app/test/test_rcu_qsbr_perf.c index 3017e71120..cf7b158d22 100644
> > --- a/app/test/test_rcu_qsbr_perf.c
> > +++ b/app/test/test_rcu_qsbr_perf.c
> > @@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
> >   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >   static struct rte_hash *h;
> >   static char hash_name[8];
> > -static rte_atomic64_t updates, checks; -static rte_atomic64_t
> > update_cycles, check_cycles;
> > +static uint64_t updates, checks;
> > +static uint64_t update_cycles, check_cycles;
> >
> >   /* Scale down results to 1000 operations to support lower
> >    * granularity clocks.
> > @@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
> >   	}
> >
> >   	cycles = rte_rdtsc_precise() - begin;
> > -	rte_atomic64_add(&update_cycles, cycles);
> > -	rte_atomic64_add(&updates, loop_cnt);
> > +	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
> > +	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
> 
> Shouldn't __atomic_add_fetch() be used instead since it pseudo-code is a bit
> simpler. What is the best option if return value is not actually used?

If the return value is not used, like the situations here, the instructions for __atomic_fetch_add() and __atomic_add_fetch() would be the same on X86 and Arm for gcc and clang that I have tried.
If the return value is used, __atomic_add_fetch() would do two more instructions('mov' 'add') than __atomic_fetch_add() to return the calculation result.
Based on experiments here: https://godbolt.org/ .


More information about the dev mailing list