[dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange

Phil Yang (Arm Technology China) Phil.Yang at arm.com
Tue Jul 9 11:27:32 CEST 2019


> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
> Sent: Friday, July 5, 2019 12:37 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> jerinj at marvell.com; Phil Yang (Arm Technology China)
> <Phil.Yang at arm.com>; dev at dpdk.org
> Cc: thomas at monjalon.net; hemant.agrawal at nxp.com; Gavin Hu (Arm
> Technology China) <Gavin.Hu at arm.com>; nd <nd at arm.com>;
> gage.eads at intel.com; nd <nd at arm.com>
> Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> 

<snip>

> >> > > +#ifdef __ARM_FEATURE_ATOMICS
> >> > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)
> >> \
> >> > > +static inline rte_int128_t                                                  \
> >> > > +cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
> >> > > +		rte_int128_t updated)                                               \
> >> > > +{                                                                           \
> >> > > +	/* caspX instructions register pair must start from even-
> >numbered
> >> > > +	 * register at operand 1.
> >> > > +	 * So, specify registers for local variables here.
> >> > > +	 */                                                                     \
> >> > > +	register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];
> \
> >> >
> >> > I understand CASP limitation on register has to be even and odd.
> >> > Is there anyway to remove explicit x0 register allocation and choose
> >> > compiler to decide the register. Some reason with optimize(03) gcc
> >> > makes correctly but not clang.
> >> >
> >> > Hardcoding to specific register makes compiler to not optimize the
> >> > stuff, especially if it is inline function.
> >>
> >> It look like the limitation fixed recently in gcc.
> >> https://patches.linaro.org/patch/147991/
> >>
> >> Not sure about old gcc and clang. ARM compiler experts may know
> >the exact
> >> status
> >>
> >We could use syntax as follows, an example is in [1]
> >static inline rte_int128_t
> >__rte_casp(rte_int128_t *dst, rte_int128_t old, rte_int128_t updated,
> >int mo)
> >{
> >		__asm__ volatile("caspl %0, %H0, %1, %H1, [%2]"
> >				 : "+r" (old)
> >				 : "r" (updated), "r" (dst)
> >				 : "memory");
> >	return old;
> >}
> 
> We have used this format for mempool/octeontx2 but clang wasn't too
> happy.
> 
> dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:151:15: error:
> value size does not match register size specified by the constraint and
> modifier [-Werror,-Wasm-operand-widths]
>                 [t0] "=&r" (t0), [t1] "=&r" (t1), [t2] "=&r" (t2),
>                             ^
> dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:82:9: note: use
> constraint modifier "w"
>                 "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n"
> 
> Had to change it to hand coded asm
> 
> http://patches.dpdk.org/patch/56110/

Hi Jerin,

The update from the compiler team is 'the LSE CASP fix has not been backported to older GCC branches'.
So, currently, this seems the only approach works for all versions of GCC and Clang. 
I think we can add another optimization patch for this once all the compilers were ready. 

Thanks,
Phil
> 
> >
> >[1] https://godbolt.org/z/EUJnuG


More information about the dev mailing list