[dpdk-dev] [PATCH v6 0/7] Cuckoo hash - part 3 of Cuckoo hash

Pablo de Lara pablo.de.lara.guarch at intel.com
Sat Jul 11 01:30:13 CEST 2015


This patchset is to replace the existing hash library with
a more efficient and functional approach, using the Cuckoo hash
method to deal with collisions. This method is based on using
two different hash functions to have two possible locations
in the hash table where an entry can be.
So, if a bucket is full, a new entry can push one of the items
in that bucket to its alternative location, making space for itself.

Advantages
~~~
- Offers the option to store more entries when the target bucket is full
  (unlike the previous implementation)
- Memory efficient: for storing those entries, it is not necessary to
  request new memory, as the entries will be stored in the same table
- Constant worst lookup time: in worst case scenario, it always takes
  the same time to look up an entry, as there are only two possible locations
  where an entry can be.
- Storing data: user can store data in the hash table, unlike the
  previous implementation, but he can still use the old API

This implementation typically offers over 90% utilization.
Notice that API has been extended, but old API remains.
Check documentation included to know more about this new implementation
(including how entries are distributed as table utilization increases).

Changes in v6:
- Replace datatype for functions from uintptr_t to void *

Changes in v5:
- Fix documentation
- Fix 32-bit compilation issues

Changes in v4:
- Unit tests enhancements are not part of this patchset anymore.
- rte_hash structure has been made internal in another patch,
  so it is not part of this patchset anymore.
- Add function to iterate through the hash table, as rte_hash
  structure has been made private.
- Added extra_flag parameter in rte_hash_parameter to be able
  to add new parameters in the future without breaking the ABI
- Remove proposed lookup_bulk_with_hash function, as it is
  not of much use with the existing hash functions
  (there are no vector hash functions).
- User can store 8-byte integer or pointer as data, instead
  of variable size data, as discussed in the mailing list.

Changes in v3:

- Now user can store variable size data, instead of 32 or 64-bit size data,
  using the new parameter "data_len" in rte_hash_parameters
- Add lookup_bulk_with_hash function in performance  unit tests
- Add new functions that handle data in performance unit tests
- Remove duplicates in performance unit tests
- Fix rte_hash_reset, which was not resetting the last entry

Changes in v2:

- Fixed issue where table could not store maximum number of entries
- Fixed issue where lookup burst could not be more than 32 (instead of 64)
- Remove unnecessary macros and add other useful ones
- Added missing library dependencies
- Used directly rte_hash_secondary instead of rte_hash_alt
- Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library
- Renamed test_hash_perf.c temporarily to ease the view of the improved unit test
- Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to
  make them private
- Corrected copyright dates
- Added an optimized function to compare keys that are multiple of 16 bytes
- Improved the way to use primary/secondary signatures. Now both are stored in
  the bucket, so there is no need to calculate them if required.
  Also, there is no need to use the MSB of a signature to differenciate between
  an empty entry and signature 0, since we are storing both signatures,
  which cannot be both 0.
- Removed rte_hash_rehash, as it was a very expensive operation.
  Therefore, the add function returns now -ENOSPC if key cannot be added
  because of a loop.
- Prefetched new slot for new key in add function to improve performance.
- Made doxygen comments more clear.
- Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data,
  as we can use the lookup functions if we want to get the data before deleting.
- Removed some unnecessary includes in rte_hash.h
- Removed some unnecessary variables in rte_cuckoo_hash.c
- Removed some unnecessary checks before creating a new hash table
- Added documentation (in release notes and programmers guide)
- Added new unit tests and replaced the performance one for hash tables

Series Acked-by: Bruce Richardson <bruce.richardson at intel.com>

Pablo de Lara (7):
  hash: replace existing hash library with cuckoo hash implementation
  hash: add new function rte_hash_reset
  hash: add new functionality to store data in hash table
  hash: add iterate function
  MAINTAINERS: claim responsability for hash library
  doc: announce ABI change of librte_hash
  doc: update hash documentation

 MAINTAINERS                          |    1 +
 app/test/test_hash.c                 |  189 +++---
 app/test/test_hash_perf.c            |  303 ++++++---
 doc/guides/prog_guide/hash_lib.rst   |  138 +++-
 doc/guides/prog_guide/index.rst      |    4 +
 doc/guides/rel_notes/abi.rst         |    1 +
 lib/librte_hash/Makefile             |    8 +-
 lib/librte_hash/rte_cuckoo_hash.c    | 1194 ++++++++++++++++++++++++++++++++++
 lib/librte_hash/rte_hash.c           |  499 --------------
 lib/librte_hash/rte_hash.h           |  198 +++++-
 lib/librte_hash/rte_hash_version.map |   15 +
 11 files changed, 1810 insertions(+), 740 deletions(-)
 create mode 100644 lib/librte_hash/rte_cuckoo_hash.c
 delete mode 100644 lib/librte_hash/rte_hash.c

-- 
2.4.3



More information about the dev mailing list