[dpdk-dev] [PATCH v2 0/7] hash: add extendable bucket and partial key hashing

Wang, Yipeng1 yipeng1.wang at intel.com
Sat Sep 29 02:46:15 CEST 2018


>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
>> Second, the patch set changes the current hashing algorithm to be "partial-
>> key hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
>> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
>> Smarter Hashing".
>I read this paper (but not the papers in references). My understanding is that the existing algorithm already uses 'partial-key hashing'.
>This patch set is not adding the 'partial-key hashing' feature. Instead it is reducing the size of the signature ('tag' as referred in the
>paper) from 32 to 16b.
>Please let me know if I have not understood this correct.
[Wang, Yipeng] Currently two signature values are stored in hash table: sig_current and sig_alt.
They are used to derived the index of two alternative buckets.
Partial key hashing avoids storing two values, but stores only one. The alternative bucket
Index is derived by XOR this single signature with the current bucket index.
So, this commit not only reduces the signature size from 32-bit to 16-bit, it also
reduces the number of signatures stored from two to one.
As a result, we can use one 64-byte cache line for the bucket instead of two.

>> Instead of storing both 32-bit signature and alternative
>> signature in the bucket, we only store a small 16-bit signature and calculate
>> the alternative bucket index by XORing the signature with the current bucket
>> index.
>According to the referenced paper, the signature ('tag') reduces the number of accesses to the keys, thus improving the performance.
>But, if we reduce the size of the signature from 32b to 16b, it will result in higher probability of false matches on the signature. This in
>turn will increase the number of accesses to keys. Have you run any performance benchmarks and compared the numbers with the
>existing code? Is it possible to share the numbers?
>
[Wang, Yipeng] 
>From our test it is very unlikely that two different keys would map to the same bucket and also have the same 16-bit signature.
Even we reduce the signature size from 32-bit to 16-bit, it should be very very rare assuming a good hash function.
For performance numbers, it could vary depending on the test case. Since the speedup comes from the 2X memory efficiency,
If your hash table is large (e.g. over last level cache), it will give much higher speedup. For the existing unit test, I generally seen 10% speedup. 




More information about the dev mailing list