[dpdk-dev] [RFC] hash: introduce resizable hash list

Bing Zhao bingz at mellanox.com
Wed Aug 28 08:51:48 CEST 2019


In the current hash library, there are already two hash tables and
several hash calculation functions.

FBK hash table is very lightweight, but the key length is limited to
4 bytes and the size of the table is fixed during startup.

Cuckoo hash table is a quite great idea and nice implementation in
the library, it is efficient and flexible. After some study of the
code and information from internet, I find that the table extension
is some pain point (correct me if anything wrong). It means that we
need to allocate the tables in advance by defining a maximum size.
So there is a chance that we waste some unused memory. Right now
there is an extendable buckets table, but it seems that the number
is also fixed and the same as the primary table.
Take the flows information for example, we may support as many as
several millions of flows. In some case, only several hundreds of
flows will be created but in the other case, millions of flows are
needed. So this means that we need to create the hash table to
support millions of elements in advance during the startup time. If
one key of flow information will consume 64B and 1M flows, so it
will occupy more than one hundred MB memory (the table fields
included). What is worse if there is only 2M + several elements, it
needs to create a table of 4M (or more: depends on the hash
collision rate) and it is some wasting of the memory.

In order to handle this, an resizable hash list is introduced.
The table could start with a small number of the elements and be
allocated dynamically during the runtime. In the meanwhile, an
on-demand list splitting mechanism is used in order not to make a
significant performance degradation. Then there is no need to
re-hash and relocate all the existing elements in the table when the
table is extended.

The brief design is as below:
1. The table is consisted of LIST header array and the LIST entries.
   In each entry, a pre-calculated hash signature is stored and is
   used to decide which header should it be linked to, by using
   "AND" with the mask to select the LSBs of the signature.
2. The header array size could start with a very small number, and a
   specified max number of each list is used to check if a table
   extension is needed.
3. When the max number on any of list is reached, the header array
   size will be doubled. Then each entries linked to this list will
   be split into two lists with the new mask (one more bit 1 in
   the mask, e.g. 0xfff -> 0x1fff). And a global shift number and
   local shift number of each list is used for the further checking.
4. When some other list is being accessed, a comparison for the shift
   numbers is used to check if the splitting of this list is needed.
   If so, then there will be two conditions:
   a. If the local shift number is only 1 less than global or
      non-zero, then this list is the one that needs to be split.
   b. If more than 1, then it means that the table is extended more
      than once. And If the local shift is zero, a mechanism is used
      to find the last unsplit list.
   And then the list will be split into 2/4/8... lists depends on
   the gap. All the entries then will linked to the proper header.
So, each time when the hlist APIs are called, only one list will be
traversed but not all the lists. And since there is parameter to set
a max number of entries in a list. The traversal time is predictable
and these will not cause a significant performance degradation.

BR. Bing


Bing Zhao (1):
  rte_hash: introduce hash list into hash lib

 lib/librte_hash/Makefile             |   2 +
 lib/librte_hash/rte_hash_version.map |  15 +
 lib/librte_hash/rte_hlist.c          | 687 +++++++++++++++++++++++++++++++++++
 lib/librte_hash/rte_hlist.h          | 563 ++++++++++++++++++++++++++++
 4 files changed, 1267 insertions(+)
 create mode 100644 lib/librte_hash/rte_hlist.c
 create mode 100644 lib/librte_hash/rte_hlist.h

-- 
1.8.3.1



More information about the dev mailing list