[dpdk-dev] [PATCH v2 0/7] hash: add extendable bucket and partial key hashing

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Thu Sep 27 06:23:10 CEST 2018



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang at intel.com>
> Sent: Friday, September 21, 2018 12:17 PM
> To: bruce.richardson at intel.com
> Cc: dev at dpdk.org; yipeng1.wang at intel.com; michel at digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Subject: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
> 
> The first four commits of the patch set try to fix small issues of previous code.
> 
> The other commits make two major optimizations over the current rte_hash
> library.
> 
> First, it adds Extendable Bucket Table feature: a new structure that can
> accommodate keys that failed to get inserted into the main hash table due to
> the unlikely event of excessive hash collisions. The hash table buckets will get
> extended using a linked list to host these keys. This new design will guarantee
> insertion of 100% of the keys for a given hash table size with minimal
> overhead. A new flag value is added for user to indicate if the extendable
> bucket feature should be enabled or not. The linked list buckets is similar
> concept to the extendable bucket hash table in packet framework.
> In details, for insertion, the linked buckets will be used to store the keys that
> fail to get in the primary and the secondary bucket and the cuckoo path could
> not find an empty location for the maximum path length (small probability).
> For lookup, the key is checked first in the primary, then the secondary, then if
> the secondary is extended the linked list is traversed for a possible match.
> 
> Second, the patch set changes the current hashing algorithm to be "partial-
> key hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> Smarter Hashing".
I read this paper (but not the papers in references). My understanding is that the existing algorithm already uses 'partial-key hashing'. This patch set is not adding the 'partial-key hashing' feature. Instead it is reducing the size of the signature ('tag' as referred in the paper) from 32 to 16b.
Please let me know if I have not understood this correct.

> Instead of storing both 32-bit signature and alternative
> signature in the bucket, we only store a small 16-bit signature and calculate
> the alternative bucket index by XORing the signature with the current bucket
> index.
According to the referenced paper, the signature ('tag') reduces the number of accesses to the keys, thus improving the performance.
But, if we reduce the size of the signature from 32b to 16b, it will result in higher probability of false matches on the signature. This in turn will increase the number of accesses to keys. Have you run any performance benchmarks and compared the numbers with the existing code? Is it possible to share the numbers?

> This doubles the hash table memory efficiency since now one bucket only
> occupies one cache line instead of two in the original design.
Agree, reduced memory footprint should help increase the performance.

> 
> V1->V2:
> 1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
> 2. hash: Reorder the rte_hash struct to align cache line better.
> 3. test: Minor changes in auto test to add key insertion failure check during
> iteration test.
> 4. test: Add new commit to fix read-write test non-consecutive core issue.
> 4. hash: Add a new commit to remove unnecessary code introduced by
> previous patches.
> 5. hash: Comments improvement and coding style improvements over
> multiple places.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang at intel.com>
> 
> Yipeng Wang (7):
>   test/hash: fix bucket size in hash perf test
>   test/hash: more accurate hash perf test output
>   test/hash: fix rw test with non-consecutive cores
>   hash: fix unnecessary code
>   hash: add extendable bucket feature
>   test/hash: implement extendable bucket hash test
>   hash: use partial-key hashing
> 
>  lib/librte_hash/rte_cuckoo_hash.c | 516 +++++++++++++++++++++++++++------
> -----
>  lib/librte_hash/rte_cuckoo_hash.h |  13 +-
>  lib/librte_hash/rte_hash.h        |   8 +-
>  test/test/test_hash.c             | 151 ++++++++++-
>  test/test/test_hash_perf.c        | 126 +++++++---
>  test/test/test_hash_readwrite.c   |  78 +++---
>  6 files changed, 672 insertions(+), 220 deletions(-)
> 
> --
> 2.7.4



More information about the dev mailing list