[dpdk-dev] [RFC] hash: introduce resizable hash list
bingz at mellanox.com
Thu Sep 12 07:41:14 CEST 2019
> -----Original Message-----
> From: Wang, Yipeng1 <yipeng1.wang at intel.com>
> Sent: Friday, September 6, 2019 3:26 AM
> To: Bing Zhao <bingz at mellanox.com>; Gobriel, Sameh
> <sameh.gobriel at intel.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>
> Cc: dev at dpdk.org
> Subject: RE: [RFC] hash: introduce resizable hash list
> >-----Original Message-----
> >From: Bing Zhao [mailto:bingz at mellanox.com]
> >Sent: Tuesday, August 27, 2019 11:52 PM
> >To: Wang, Yipeng1 <yipeng1.wang at intel.com>; Gobriel, Sameh
> ><sameh.gobriel at intel.com>; Richardson, Bruce
> ><bruce.richardson at intel.com>; De Lara Guarch, Pablo
> ><pablo.de.lara.guarch at intel.com>
> >Cc: dev at dpdk.org; bingz at mellanox.com
> >Subject: [RFC] hash: introduce resizable hash list
> >In the current hash library, there are already two hash tables and
> >several hash calculation functions.
> >FBK hash table is very lightweight, but the key length is limited to
> >4 bytes and the size of the table is fixed during startup.
> >Cuckoo hash table is a quite great idea and nice implementation in
> >library, it is efficient and flexible. After some study of the code and
> >information from internet, I find that the table extension is some pain
> >point (correct me if anything wrong). It means that we need to
> >the tables in advance by defining a maximum size.
> >So there is a chance that we waste some unused memory. Right now
> >is an extendable buckets table, but it seems that the number is also
> >fixed and the same as the primary table.
> >Take the flows information for example, we may support as many as
> >several millions of flows. In some case, only several hundreds of
> >will be created but in the other case, millions of flows are needed. So
> >this means that we need to create the hash table to support millions
> >elements in advance during the startup time. If one key of flow
> >information will consume 64B and 1M flows, so it will occupy more
> >one hundred MB memory (the table fields included). What is worse if
> >there is only 2M + several elements, it needs to create a table of 4M
> >(or more: depends on the hash collision rate) and it is some wasting
> >the memory.
> >In order to handle this, an resizable hash list is introduced.
> >The table could start with a small number of the elements and be
> >allocated dynamically during the runtime. In the meanwhile, an
> >on-demand list splitting mechanism is used in order not to make a
> >significant performance degradation. Then there is no need to re-
> >and relocate all the existing elements in the table when the table is
> >The brief design is as below:
> >1. The table is consisted of LIST header array and the LIST entries.
> > In each entry, a pre-calculated hash signature is stored and is
> > used to decide which header should it be linked to, by using
> > "AND" with the mask to select the LSBs of the signature.
> >2. The header array size could start with a very small number, and a
> > specified max number of each list is used to check if a table
> > extension is needed.
> >3. When the max number on any of list is reached, the header array
> > size will be doubled. Then each entries linked to this list will
> > be split into two lists with the new mask (one more bit 1 in
> > the mask, e.g. 0xfff -> 0x1fff). And a global shift number and
> > local shift number of each list is used for the further checking.
> >4. When some other list is being accessed, a comparison for the shift
> > numbers is used to check if the splitting of this list is needed.
> > If so, then there will be two conditions:
> > a. If the local shift number is only 1 less than global or
> > non-zero, then this list is the one that needs to be split.
> > b. If more than 1, then it means that the table is extended more
> > than once. And If the local shift is zero, a mechanism is used
> > to find the last unsplit list.
> > And then the list will be split into 2/4/8... lists depends on
> > the gap. All the entries then will linked to the proper header.
> >So, each time when the hlist APIs are called, only one list will be
> >traversed but not all the lists. And since there is parameter to set a
> >max number of entries in a list. The traversal time is predictable and
> >these will not cause a significant performance degradation.
> >BR. Bing
> >Bing Zhao (1):
> > rte_hash: introduce hash list into hash lib
> > lib/librte_hash/Makefile | 2 +
> > lib/librte_hash/rte_hash_version.map | 15 +
> > lib/librte_hash/rte_hlist.c | 687
> > lib/librte_hash/rte_hlist.h | 563
> > 4 files changed, 1267 insertions(+)
> > create mode 100644 lib/librte_hash/rte_hlist.c create mode 100644
> > lib/librte_hash/rte_hlist.h
> [Wang, Yipeng]
> Hi, Bing,
> Table resizing is very important and I think it is a critical feature to have
> in DPDK's hash library. Thanks a lot for putting effort on this and I hope
> we could work out a solution based on your patch.
Thanks. To my understanding, yes it is very important. Especially, there is
some case that the maximum capacity is quite large but elements could
be just a few during one run time life cycle.
> My major concern of the current implementation is as following:
> Although we still have the fbk hash table implementation, I think the
> goal here is to have the users to move to the main hash table
> implementation (the cuckoo algorithm), since it is the better one in
> most use cases. There have been many optimizations done on the
> current code base since the cuckoo hash was introduced, we do not
> want to reinvent the wheel such as the multi-threading support, non-
> TSO machine support, SIMD optimizations, etc. Also, comparing to
> linked list, the current bucketized table should provide better lookup
> performance. Last but not least, we probably don't want fragmented
> hash table APIs.
Yes, I see and I agree. I have some basic study of the cuckoo and it is
quite a nice implementation from the paper to the engineering. And a
lot of optimizations both on X86_64 and aarch64 were done. And unique
solution will be much easier and less problematic for the users and
> I would suggest to see if we could add resizing feature to the current
> hash table implementation first.
> For example, we can use the current bucketized design to do the
> resizing rather than the new linked list-based data structure.
> We do need some modifications such as having a local bucket_shift
> variable each bucket, but I think it is feasible. We could also turn off
> the cuckoo path/extendable linked list of the current implementation
> when resizing is needed, so that it would be simpler for the
> In such way, we keep the current API and also reuse many of the
> existing code.
That will be quite great if the resize with cuckoo hash could be realized.
I am not quite familiar with the current status of the researching. Is there
any paper or ideas these years for the cuckoo hash algorithm already?
> Also, I wonder if you have a specific use case that benefit from the
> gradually break-down approach? Since we could always stop the world
> and resize the table at once.
Do you mean the on-demand (when being accessed) split? Theoretically,
no for the total time, or even more because of the calculation and extra
checking. And this is only to smoothen the rate when there is already a
lot of elements in the table, e.g. 1M. Traversing 1M elements and relocate
them into the right position will be very time consuming. Especially the
memory is dynamically allocated but not continuous, the cache may be
also a problem with the linked list. I tested with a simple case and the
rate seemed to be smooth (no big jitter) - there is also some other actions
in the case and the hash action occupy about 15% of the CPU.
And this is only for the list case. I think when using cuckoo hash, maybe
there is no need of such trick 😊
Thanks a lot
More information about the dev