[dpdk-dev] Arm roadmap for 20.05

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Tue Mar 24 22:41:45 CET 2020


<snip>
(apologies Morten - I missed your response, consolidating the discussion in this thread)

+ Intel x86 and IBM POWER maintainers

> 
> > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05
> > >>>>>
> > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote:
> > >>>>>> Hello,
> > >>>>>> 	Following are the work items planned for 20.05:
> > >>>>>>
> > >>>>>> 1) Use C11 atomic APIs in timer library
> > >>>>>> 2) Use C11 atomic APIs in service cores
> > >>>>>> 3) Use C11 atomics in VirtIO split ring
> > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm
> > >>>>>> platforms
> > >>>>>> 5) RCU defer API
> > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases
> > >>>>>>
> > >>>>>> Thank you,
> > >>>>>> Honnappa
> > >>>>> Maybe you should have a look at legacy DPDK atomics as well?
> > >>>>> Avoiding a full barrier for the add operation, for example.
> > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not
> > >>>> take
> > >> memory order as a parameter. So, it is difficult to change the
> > >> implementation for those APIs. For ex: the add operation could take
> > >> a RELEASE or RELAXED order depending on the use case.
> > >>>> So, the proposal is to deprecate the rte_atomic APIs and use C11
> > >>>> APIs directly. The proposal is here:
> > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185-
> > >> 865b
> > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-
> > >> aa5e88331555&
> > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F
> > >>> Even though rte_atomic lacks the flexibility of C11 atomics, there
> > >>> might still be areas of improvement. Such improvements will have
> > >>> an instant effect, as opposed to waiting for all the rte_atomic users to
> change.
> > >>>
> > >>>
> > >>> The rte_atomic API leaves ordering unspecified, unfortunately. In
> > >>> the Linux kernel, from which DPDK seems to borrow much of the
> > >>> atomics and memory order related semantics, an atomic add doesn't
> > >>> imply any memory barriers. The current
> > >>> __sync_fetch_and_add()-based implementation implies a full barrier
> > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use C11
> > >>> atomics to implement rte_atomic in ARM, you could use a relaxed
> > >>> memory order on
> > >>> rte_atomic*_add() (assuming you agree those are the implicit
> > >>> semantics of the legacy API) and just get an ldadd instruction. An
> > >>> alternative would be to implement the same thing in assembler, of
> course.
> > >>>
> > >>>
> > >> Another approach might be to just scrap all of the intrinsics and
> > >> inline assembler used for all the functions in rte_atomic, on all
> > >> architectures, and use C11 atomics instead.
> > > Yes, this is the approach we are taking. But, it does not solve the
> > > use of
> > rte_atomic APIs in the applications.
> >
> > Agreed.
> >
> >
> > Another question. "C11 atomics" here seems to mean using GCC
> > instrinsics, normally used to implement C11 atomics, not C11 atomics (i.e.
> <stdatomic.h>).
> > What is the reason directly calling the intrinsics, rather than using
> > the standard API?
> I did not know they existed for C. Looking at them, they looks like just
> wrappers around the intrinsics. The advantage seems to be the type check
> enforced by the compiler. i.e. if a variable is defined of type '_Atomic', the
> compiler should not allow any non-atomic operations on them. Anything else?
> I will explore this further.
I see some issues expressed for Intel ICC compiler [1], but they seem to have been fixed in the latest versions [2]. Please check.

[1] https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815
[2] https://software.intel.com/en-us/articles/c11-support-in-intel-c-compiler

> 
> >
> >
> > With this in mind, wouldn't be better to extend <rte_atomic.h> with
> > functions that take a memory ordering parameter? And properly document
> > the memory ordering for the functions already in this API, and maybe
> > deprecate some functions in favor of others, more C11-like, functions?
> I would prefer to use what the language provides rather than creating DPDK's
> own, which will be just wrappers on top of what C provides. If we follow the
> existing model of rte_atomic APIs, we will be creating these for every size of
> the parameter (rte_atomic8/16/32/64_xxx). This results in more core to
> maintain.
> 
> > If not, assuming <stdatomic.h> can't be used, wouldn't it be better if
> > we added a <rte_stdatomic.h>, which mimics the standard API, maybe
> > with some DPDK tweaks, plus potentially with DPDK-specific extensions as
> well?
> What kind of extensions are you thinking about?
> 
> >
> >
> > Directly accessing instrinsics will lead to things like
> > __atomic_add_ifless() (already in DPDK code base), when people need to
> > extend the API. This very much look like GCC built-in function, but is not.
> I think the DPDK code should not be using symbols that will potentially collide
> with language/library symbols.
> Luckily, in this case, it is internal to a PMD which can be changed.
> It also contains more symbols which are on the border to collide with
> 'stdatomic.h'.
> 
> >
> >
> > Sorry for hijacking the ARM roadmap thread.
> No problem. I am glad we are having these important discussions.
> 
> >



More information about the dev mailing list