[dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions

Wang, Yipeng1 yipeng1.wang at intel.com
Tue Oct 2 03:52:05 CEST 2018


>-----Original Message-----
>From: Ola Liljedahl [mailto:Ola.Liljedahl at arm.com]
>On 28/09/2018, 02:43, "Wang, Yipeng1" <yipeng1.wang at intel.com> wrote:
>
>    Some general comments for  the various __atomic_store/load added,
>
>    1. Although it passes the compiler check, but I just want to confirm that if we should use GCC/clang builtins, or if
>    There are higher level APIs in DPDK to do atomic operations?
>[Ola] Adding "higher level" API's on top of the basic language/compiler support is not a good idea.
>There is an infinite amount of base types for the atomic operations, multiply that with all different types of atomic operations (e.g.
>load, store, fetch_add, add, cas etc etc) and the different memory orderings and you create a very large API (but likely only a small but
>irregular subset will be used). So lots of work for little gain and difficult to test every single item in the API.
>
>For some compiler that does not support __atomic builtins, one could write an __atomic emulation layer. But I think GCC __atomic is
>already the ideal source code abstraction.
[Wang, Yipeng]Thanks for the explanation. I think OVS does something like using macros to abstract the various atomic
function from different compilers/architectures. But anyway,
since rte_ring is using the builtin as well and the compiler check passed, I am OK with the implementation.
Another comment I replied earlier is that rte_ring seems having a c11 header for using them. Should we
assume similar thing?

>
>
>    2. We believe compiler will translate the atomic_store/load to regular MOV instruction on
>    Total Store Order architecture (e.g. X86_64). But we run the perf test on x86 and here is the relative slowdown on
>    lookup comparing to master head. I am not sure if the performance drop comes from the atomic buitins.
>[Ola] Any performance difference is most likely not from the use of atomic builtins because as you write, on x86 they should translate
>to normal loads and stores in most situations. But the code and data structures have changed so there is some difference in e.g.
>memory accesses, couldn't this explain the performance difference?>
[Wang, Yipeng] Yes it might be. 


>    [Wang, Yipeng] I did not quite understand why do we need synchronization for hash data update.
>    Since pdata write is already atomic, the lookup will either read out the stale data or the new data,
>    which should be fine without synchronization.
>    Is it to ensure the order of multiple reads in lookup threads?
>[Ola] If pdata is used as a reference to access other shared data, you need to ensure that the access of pdata and accesses to other
>data are ordered appropriately (e.g. with acquire/release). I think reading a new pdata but stale associated data is a bad thing.
>
[Wang, Yipeng] Thanks for the explanation. I got it now!




More information about the dev mailing list