[dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions

Wang, Yipeng1 yipeng1.wang at intel.com
Tue Oct 2 00:41:56 CEST 2018


>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
>Sent: Sunday, September 30, 2018 3:21 PM
>To: Wang, Yipeng1 <yipeng1.wang at intel.com>; Richardson, Bruce <bruce.richardson at intel.com>; De Lara Guarch, Pablo
><pablo.de.lara.guarch at intel.com>
>Cc: dev at dpdk.org; Gavin Hu (Arm Technology China) <Gavin.Hu at arm.com>; Steve Capper <Steve.Capper at arm.com>; Ola Liljedahl
><Ola.Liljedahl at arm.com>; nd <nd at arm.com>
>Subject: RE: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
>
>>
>> Some general comments for  the various __atomic_store/load added,
>>
>> 1. Although it passes the compiler check, but I just want to confirm that if we
>> should use GCC/clang builtins, or if There are higher level APIs in DPDK to do
>> atomic operations?
>>
>I have used gcc builtins (just like rte_ring does)
[Wang, Yipeng] I checked rte_ring, it also has a specific header for C11, since it is a C11
standard, do we need something similar here? 
>
>> 2. We believe compiler will translate the atomic_store/load to regular MOV
>> instruction on Total Store Order architecture (e.g. X86_64). But we run the
>> perf test on x86 and here is the relative slowdown on lookup comparing to
>> master head. I am not sure if the performance drop comes from the atomic
>> buitins.
>>
>C11 atomics also block compiler reordering. Other than this, the retry loop is an addition to lookup.
>The patch also has the alignment corrected. I am not sure how is that affecting the perf numbers.
>
>> Keysize | single lookup | bulk lookup
>> 4 | 0.93 | 0.95
>> 8 | 0.95 | 0.96
>> 16 | 0.97 | 0.96
>> 32 | 0.97 | 1.00
>> 48 | 1.03 | 0.99
>> 64 | 1.04 | 0.98
>> 9 | 0.91 | 0.96
>> 13 | 0.97 | 0.98
>> 37 | 1.04 | 1.03
>> 40 | 1.02 | 0.98
>>
>I assume this is the data from the test cases in test_hash_perf.c file. I tried to reproduce this data, but my data is worse. Can you
>specify the actual test from test_hash_perf.c you are using (With locks/Pre-computed hash/With data/Elements in primary)?
>IMO, the differences you have provided are not high.

[Wang, Yipeng] I remember the performance data I used is the no-lock, without hash, with 8-byte data, in both primary and secondary.
I compared the master head  to the one with your first two commits. 
>
>> [Wang, Yipeng] I think even for current code, we need to check empty_slot.
>> Could you export this as a bug fix commit?
>>
>In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are you referring to '!!key_idx'? I think this should be changed to
>'(key_idx != EMPTY_SLOT)'.
[Wang, Yipeng] Yeah, I guess I did not see that part. Then I guess it is no need to export as a bug fix for now since it is not a functional issue.
Your change is good.



More information about the dev mailing list