[dpdk-dev] Arm roadmap for 20.05

Mattias Rönnblom mattias.ronnblom at ericsson.com
Mon Mar 23 18:34:32 CET 2020


On 2020-03-23 18:14, Honnappa Nagarahalli wrote:
> <snip>
>
>>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05
>>>>>
>>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote:
>>>>>> Hello,
>>>>>> 	Following are the work items planned for 20.05:
>>>>>>
>>>>>> 1) Use C11 atomic APIs in timer library
>>>>>> 2) Use C11 atomic APIs in service cores
>>>>>> 3) Use C11 atomics in VirtIO split ring
>>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm
>>>>>> platforms
>>>>>> 5) RCU defer API
>>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases
>>>>>>
>>>>>> Thank you,
>>>>>> Honnappa
>>>>> Maybe you should have a look at legacy DPDK atomics as well?
>>>>> Avoiding a full barrier for the add operation, for example.
>>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not take
>> memory order as a parameter. So, it is difficult to change the implementation
>> for those APIs. For ex: the add operation could take a RELEASE or RELAXED
>> order depending on the use case.
>>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs
>>>> directly. The proposal is here:
>>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185-
>> 865b
>>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-
>> aa5e88331555&
>>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F
>>> Even though rte_atomic lacks the flexibility of C11 atomics, there
>>> might still be areas of improvement. Such improvements will have an
>>> instant effect, as opposed to waiting for all the rte_atomic users to change.
>>>
>>>
>>> The rte_atomic API leaves ordering unspecified, unfortunately. In the
>>> Linux kernel, from which DPDK seems to borrow much of the atomics and
>>> memory order related semantics, an atomic add doesn't imply any memory
>>> barriers. The current __sync_fetch_and_add()-based implementation
>>> implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If
>>> you would use C11 atomics to implement rte_atomic in ARM, you could
>>> use a relaxed memory order on rte_atomic*_add() (assuming you agree
>>> those are the implicit semantics of the legacy API) and just get an
>>> ldadd instruction. An alternative would be to implement the same thing
>>> in assembler, of course.
>>>
>>>
>> Another approach might be to just scrap all of the intrinsics and inline
>> assembler used for all the functions in rte_atomic, on all architectures, and
>> use C11 atomics instead.
> Yes, this is the approach we are taking. But, it does not solve the use of rte_atomic APIs in the applications.	

Agreed.


Another question. "C11 atomics" here seems to mean using GCC 
instrinsics, normally used to implement C11 atomics, not C11 atomics 
(i.e. <stdatomic.h>). What is the reason directly calling the 
intrinsics, rather than using the standard API?


With this in mind, wouldn't be better to extend <rte_atomic.h> with 
functions that take a memory ordering parameter? And properly document 
the memory ordering for the functions already in this API, and maybe 
deprecate some functions in favor of others, more C11-like, functions? 
If not, assuming <stdatomic.h> can't be used, wouldn't it be better if 
we added a <rte_stdatomic.h>, which mimics the standard API, maybe with 
some DPDK tweaks, plus potentially with DPDK-specific extensions as well?


Directly accessing instrinsics will lead to things like 
__atomic_add_ifless() (already in DPDK code base), when people need to 
extend the API. This very much look like GCC built-in function, but is not.


Sorry for hijacking the ARM roadmap thread.




More information about the dev mailing list