[dpdk-dev] Arm roadmap for 20.05

Mattias Rönnblom mattias.ronnblom at ericsson.com
Tue Apr 7 21:10:38 CEST 2020



On 2020-03-24 19:53, Honnappa Nagarahalli wrote:
> <snip>
> 
>>>>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05
>>>>>>>
>>>>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote:
>>>>>>>> Hello,
>>>>>>>> 	Following are the work items planned for 20.05:
>>>>>>>>
>>>>>>>> 1) Use C11 atomic APIs in timer library
>>>>>>>> 2) Use C11 atomic APIs in service cores
>>>>>>>> 3) Use C11 atomics in VirtIO split ring
>>>>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm
>>>>>>>> platforms
>>>>>>>> 5) RCU defer API
>>>>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Honnappa
>>>>>>> Maybe you should have a look at legacy DPDK atomics as well?
>>>>>>> Avoiding a full barrier for the add operation, for example.
>>>>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not
>>>>>> take
>>>> memory order as a parameter. So, it is difficult to change the
>>>> implementation for those APIs. For ex: the add operation could take a
>>>> RELEASE or RELAXED order depending on the use case.
>>>>>> So, the proposal is to deprecate the rte_atomic APIs and use C11
>>>>>> APIs directly. The proposal is here:
>>>>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185-
>>>> 865b
>>>>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-
>>>> aa5e88331555&
>>>>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F
>>>>> Even though rte_atomic lacks the flexibility of C11 atomics, there
>>>>> might still be areas of improvement. Such improvements will have an
>>>>> instant effect, as opposed to waiting for all the rte_atomic users to change.
>>>>>
>>>>>
>>>>> The rte_atomic API leaves ordering unspecified, unfortunately. In
>>>>> the Linux kernel, from which DPDK seems to borrow much of the
>>>>> atomics and memory order related semantics, an atomic add doesn't
>>>>> imply any memory barriers. The current __sync_fetch_and_add()-based
>>>>> implementation implies a full barrier (ldadd+dmb) or release
>>>>> (ldaddal, on v8.1-a). If you would use C11 atomics to implement
>>>>> rte_atomic in ARM, you could use a relaxed memory order on
>>>>> rte_atomic*_add() (assuming you agree those are the implicit
>>>>> semantics of the legacy API) and just get an ldadd instruction. An
>>>>> alternative would be to implement the same thing in assembler, of course.
>>>>>
>>>>>
>>>> Another approach might be to just scrap all of the intrinsics and
>>>> inline assembler used for all the functions in rte_atomic, on all
>>>> architectures, and use C11 atomics instead.
>>> Yes, this is the approach we are taking. But, it does not solve the use of
>> rte_atomic APIs in the applications.
>>
>> Agreed.
>>
>>
>> Another question. "C11 atomics" here seems to mean using GCC instrinsics,
>> normally used to implement C11 atomics, not C11 atomics (i.e. <stdatomic.h>).
>> What is the reason directly calling the intrinsics, rather than using the standard
>> API?
> I did not know they existed for C. Looking at them, they looks like just wrappers around the intrinsics. The advantage seems to be the type check enforced by the compiler. i.e. if a variable is defined of type '_Atomic', the compiler should not allow any non-atomic operations on them. Anything else?
> I will explore this further.
> 

That's the only difference I know of. My initial impression was that the 
type checking was more of a bug than a feature in this case, but I might 
well be wrong. I have very little practical experience with the 
<stdatomic.h> constructs.

>>
>>
>> With this in mind, wouldn't be better to extend <rte_atomic.h> with functions
>> that take a memory ordering parameter? And properly document the memory
>> ordering for the functions already in this API, and maybe deprecate some
>> functions in favor of others, more C11-like, functions?
> I would prefer to use what the language provides rather than creating DPDK's own, which will be just wrappers on top of what C provides. If we follow the existing model of rte_atomic APIs, we will be creating these for every size of the parameter (rte_atomic8/16/32/64_xxx). This results in more core to maintain.
> 
>> If not, assuming <stdatomic.h> can't be used, wouldn't it be better if we added
>> a <rte_stdatomic.h>, which mimics the standard API, maybe with some DPDK
>> tweaks, plus potentially with DPDK-specific extensions as well?
> What kind of extensions are you thinking about?
> 

It's difficult to make predictions, especially about the future.

__atomic_add_ifless() is an example already in the code base.

 From what I understand, there was something missing in C11 for an 
efficient RCU implementation in the Linux kernel. "Consume load", if I 
recall correctly.

A middle ground could be to make <rte_atomic.h> obsolete, but then 
introduce a <rte_atomic11.h>, which would be a thin wrapper, in the same 
manner as <stdatomic.h>.

>>
>>
>> Directly accessing instrinsics will lead to things like
>> __atomic_add_ifless() (already in DPDK code base), when people need to
>> extend the API. This very much look like GCC built-in function, but is not.
> I think the DPDK code should not be using symbols that will potentially collide with language/library symbols.
> Luckily, in this case, it is internal to a PMD which can be changed.
> It also contains more symbols which are on the border to collide with 'stdatomic.h'.
> 
>>
>>
>> Sorry for hijacking the ARM roadmap thread.
> No problem. I am glad we are having these important discussions.
> 
>>
> 


More information about the dev mailing list