[dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)

Eads, Gage gage.eads at intel.com
Tue Jan 22 23:25:41 CET 2019



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
> Sent: Tuesday, January 22, 2019 2:31 PM
> To: Eads, Gage <gage.eads at intel.com>; dev at dpdk.org
> Cc: olivier.matz at 6wind.com; arybchenko at solarflare.com; Richardson, Bruce
> <bruce.richardson at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; nd <nd at arm.com>;
> chaozhu at linux.vnet.ibm.com; jerinj at marvell.com; hemant.agrawal at nxp.com;
> nd <nd at arm.com>
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only)
> 
> Added other platform owners.
> 
> > > > > > @@ -208,4 +209,25 @@ static inline void
> > > > > > rte_atomic64_clear(rte_atomic64_t
> > > > > > *v)  }  #endif
> > > > > >
> > > > > > +static inline int
> > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp,
> > > > > > +uint64_t
> > > > > > +*src) {
> > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src'
> > > > > should be pointers to 128b (__int128)? Or we could define our
> > > > > own data
> > > > type.
> > > >
> > > > I agree, I'm not a big fan of the 64b pointers here. I avoided
> > > > __int128 originally because it fails to compile with -pedantic,
> > > > but on second thought (and with your suggestion of a separate data
> > > > type), we can resolve that with this typedef:
> > > >
> > > > typedef struct {
> > > >         RTE_STD_C11 __int128 val;
> > > > } rte_int128_t;
> > > ok
> > >
> > > >
> > > > > Since, it is a new API, can we define it with memory orderings
> > > > > which will be more conducive to relaxed memory ordering based
> > architectures?
> > > > > You can refer to [1] and [2] for guidance.
> > > >
> > > > I certainly see the value in controlling the operation's memory
> > > > ordering, like in the __atomic intrinsics, but I'm not sure this
> > > > patchset is the right place to address that. I see that work going
> > > > a couple
> > > ways:
> > > > 1. Expand the existing rte_atomicN_* interfaces with additional
> > > > arguments. In that case, I'd prefer this be done in a separate
> > > > patchset that addresses all the atomic operations, not just
> > > > cmpset, so the interface changes are chosen according to the needs
> > > > of the full set of atomic operations. If this approach is taken
> > > > then there's no need to solve this while rte_atomic128_cmpset is
> > > > experimental, since all the
> > > other functions are non-experimental anyway.
> > > >
> > > > - Or -
> > > >
> > > > 2. Don't modify the existing rte_atomicN_* interfaces (or their
> > > > strongly ordered behavior), and instead create new versions of
> > > > them that take additional arguments. In this case, we can
> > > > implement
> > > > rte_atomic128_cmpset() as is and create a more flexible version in
> > > > a later
> > > patchset.
> > > >
> > > > Either way, I think the current interface (w.r.t. memory ordering
> > > > options) can work and still leaves us in a good position for
> > > > future
> > > changes/improvements.
> > > >
> > > I do not see the need to modify/extend the existing rte_atomicN_*
> > > APIs as the corresponding __atomic intrinsics serve as replacements.
> > > I expect that at some point, DPDK code base will not be using
> > rte_atomicN_* APIs.
> > > However, __atomic intrinsics do not support 128b wide parameters.
> > > Hence
> >
> > I don't think that's correct. From the GCC docs:
> >
> > "16-byte integral types are also allowed if `__int128' (see __int128)
> > is supported by the architecture."
> >
> > This works with x86 -64 -- I assume aarch64 also, but haven't confirmed.
> >
> > Source: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic-
> > Builtins.html
> (following is based on my reading, not based on experiments) My understanding
> is that the __atomic_load_n/store_n were implemented using a compare-and-
> swap HW instruction (due to lack of HW 128b atomic load and store
> instructions). This introduced the side effect or store/load respectively. Where
> as the user does not expect such side effects. As suggested in [1], it looks like
> GCC delegated the implementation to libatomic which 'it seems' uses locks to
> implement 128b __atomic intrinsics (needs to be verified)
>
> If __atomic intrinsics, for any of the supported platforms, do not have an
> optimal implementation, I prefer to add a DPDK API as an abstraction. A given
> platform can choose to implement such an API using __atomic intrinsics if it
> wants. The DPDK API can be similar to __atomic_compare_exchange_n.
> 

Certainly. From the linked discussions, I see how this would affect the design of (hypothetical functions) rte_atomic128_read() and rte_atomic128_set(), but I don't see anything that suggests (for the architectures being discussed) that __atomic_compare_exchange_n is suboptimal.

> [1] https://patchwork.ozlabs.org/patch/721686/
> [2] https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
> 
> >
> > > DPDK needs to write its own. Since this is the first API in that
> > > regard, I prefer that we start with a signature that resembles
> > > __atomic intrinsics which have been proven to provide best
> > > flexibility for
> > all the platforms supported by DPDK.


More information about the dev mailing list