[dpdk-dev] Arm roadmap for 20.05

Chen, Zhaoyan zhaoyan.chen at intel.com
Thu Apr 9 03:25:45 CEST 2020


> -----Original Message-----
> From: dev <dev-bounces at dpdk.org> On Behalf Of Honnappa Nagarahalli
> Sent: Tuesday, April 7, 2020 1:15 PM
> To: Mattias Rönnblom <mattias.ronnblom at ericsson.com>; dev at dpdk.org;
> thomas at monjalon.net; david.marchand at redhat.com; Morten Brørup
> <mb at smartsharesystems.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; Van Haaren, Harry
> <harry.van.haaren at intel.com>; David Christensen
> <drc at linux.vnet.ibm.com>; Phil Yang <Phil.Yang at arm.com>
> Cc: Song Zhu <Song.Zhu at arm.com>; Gavin Hu <Gavin.Hu at arm.com>; Jeff
> Brownlee <Jeff.Brownlee at arm.com>; Philippe Robin
> <Philippe.Robin at arm.com>; Pravin Kantak <Pravin.Kantak at arm.com>; nd
> <nd at arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> Subject: Re: [dpdk-dev] Arm roadmap for 20.05
> 
> <snip>
> 
> > Subject: RE: [dpdk-dev] Arm roadmap for 20.05
> >
> > <snip>
> > (apologies Morten - I missed your response, consolidating the
> > discussion in this thread)
> >
> > + Intel x86 and IBM POWER maintainers
> >
> > >
> > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05
> > > > >>>>>
> > > > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote:
> > > > >>>>>> Hello,
> > > > >>>>>> Following are the work items planned for 20.05:
> > > > >>>>>>
> > > > >>>>>> 1) Use C11 atomic APIs in timer library
> > > > >>>>>> 2) Use C11 atomic APIs in service cores
> > > > >>>>>> 3) Use C11 atomics in VirtIO split ring
> > > > >>>>>> 4) Performance optimizations in i40e and MLX drivers for
> > > > >>>>>> Arm platforms
> > > > >>>>>> 5) RCU defer API
> > > > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test
> > > > >>>>>> cases
> > > > >>>>>>
> > > > >>>>>> Thank you,
> > > > >>>>>> Honnappa
> > > > >>>>> Maybe you should have a look at legacy DPDK atomics as well?
> > > > >>>>> Avoiding a full barrier for the add operation, for example.
> > > > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do
> > > > >>>> not take
> > > > >> memory order as a parameter. So, it is difficult to change the
> > > > >> implementation for those APIs. For ex: the add operation could
> > > > >> take a RELEASE or RELAXED order depending on the use case.
> > > > >>>> So, the proposal is to deprecate the rte_atomic APIs and use
> > > > >>>> C11 APIs directly. The proposal is here:
> > > > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-
> > 2e04718
> > > > >>>> 5-
> > > > >> 865b
> > > > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-
> > > > >> aa5e88331555&
> > > > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F
> > > > >>> Even though rte_atomic lacks the flexibility of C11 atomics,
> > > > >>> there might still be areas of improvement. Such improvements
> > > > >>> will have an instant effect, as opposed to waiting for all the
> > > > >>> rte_atomic users to
> > > change.
> > > > >>>
> > > > >>>
> > > > >>> The rte_atomic API leaves ordering unspecified, unfortunately.
> > > > >>> In the Linux kernel, from which DPDK seems to borrow much of
> > > > >>> the atomics and memory order related semantics, an atomic add
> > > > >>> doesn't imply any memory barriers. The current
> > > > >>> __sync_fetch_and_add()-based implementation implies a full
> > > > >>> barrier
> > > > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use
> > > > >>> C11 atomics to implement rte_atomic in ARM, you could use a
> > > > >>> relaxed memory order on
> > > > >>> rte_atomic*_add() (assuming you agree those are the implicit
> > > > >>> semantics of the legacy API) and just get an ldadd instruction.
> > > > >>> An alternative would be to implement the same thing in
> > > > >>> assembler, of
> > > course.
> > > > >>>
> > > > >>>
> > > > >> Another approach might be to just scrap all of the intrinsics
> > > > >> and inline assembler used for all the functions in rte_atomic,
> > > > >> on all architectures, and use C11 atomics instead.
> > > > > Yes, this is the approach we are taking. But, it does not solve
> > > > > the use of
> > > > rte_atomic APIs in the applications.
> > > >
> > > > Agreed.
> > > >
> > > >
> > > > Another question. "C11 atomics" here seems to mean using GCC
> > > > instrinsics, normally used to implement C11 atomics, not C11
> atomics (i.e.
> > > <stdatomic.h>).
> > > > What is the reason directly calling the intrinsics, rather than
> > > > using the standard API?
> > > I did not know they existed for C. Looking at them, they looks like
> > > just wrappers around the intrinsics. The advantage seems to be the
> > > type check enforced by the compiler. i.e. if a variable is defined
> > > of type '_Atomic', the compiler should not allow any non-atomic
> > > operations on
> > them. Anything else?
> > > I will explore this further.
> > I see some issues expressed for Intel ICC compiler [1], but they seem
> > to have been fixed in the latest versions [2]. Please check.
> >
> > [1]
> > https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815
> > [2]
> > https://software.intel.com/en-us/articles/c11-support-in-intel-c-compi
> > ler
> >
> I looked into this some more. The built-ins are supported in GCC from 4.7
> and in clang from 3.1. The stdatomic.h is supported in GCC from 4.9 and in
> clang from 3.6.
> 
> I see that Intel Compilation CI has 3 configurations that use GCC 4.8.5 and
> Clang 3.4.2. Any reasoning for using these? Can these be upgraded?

[Chen, Zhaoyan] It's associated with CENTOS7.7/RHEL7.7 distro which is still
in lifecycle. Even CENTOS7.7/RHEL7.7 was released at 19.08.  

> 
> > >
> > > >
> > > >
> > > > With this in mind, wouldn't be better to extend <rte_atomic.h>
> > > > with functions that take a memory ordering parameter? And
> properly
> > > > document the memory ordering for the functions already in this
> > > > API, and maybe deprecate some functions in favor of others, more
> > > > C11-like,
> > functions?
> > > I would prefer to use what the language provides rather than
> > > creating DPDK's own, which will be just wrappers on top of what C
> > > provides. If we follow the existing model of rte_atomic APIs, we
> > > will be creating these for every size of the parameter
> > > (rte_atomic8/16/32/64_xxx). This results in more core to maintain.
> > >
> > > > If not, assuming <stdatomic.h> can't be used, wouldn't it be
> > > > better if we added a <rte_stdatomic.h>, which mimics the standard
> > > > API, maybe with some DPDK tweaks, plus potentially with
> > > > DPDK-specific extensions as
> > > well?
> > > What kind of extensions are you thinking about?
> > >
> > > >
> > > >
> > > > Directly accessing instrinsics will lead to things like
> > > > __atomic_add_ifless() (already in DPDK code base), when people
> > > > need to extend the API. This very much look like GCC built-in
> function, but is not.
> > > I think the DPDK code should not be using symbols that will
> > > potentially collide with language/library symbols.
> > > Luckily, in this case, it is internal to a PMD which can be changed.
> > > It also contains more symbols which are on the border to collide
> > > with 'stdatomic.h'.
> > >
> > > >
> > > >
> > > > Sorry for hijacking the ARM roadmap thread.
> > > No problem. I am glad we are having these important discussions.
> > >
> > > >
> >



More information about the dev mailing list