[dpdk-dev] [PATCH 0/3] add new Double Word Key hash table

Medvedkin, Vladimir vladimir.medvedkin at intel.com
Wed Apr 1 20:28:00 CEST 2020


Hi Thomas,

-----Original Message-----
From: Thomas Monjalon <thomas at monjalon.net> 
Sent: Tuesday, March 31, 2020 8:56 PM
To: Medvedkin, Vladimir <vladimir.medvedkin at intel.com>
Cc: Wang, Yipeng1 <yipeng1.wang at intel.com>; Stephen Hemminger <stephen at networkplumber.org>; dev at dpdk.org; Morten Brørup <mb at smartsharesystems.com>; dev at dpdk.org; Ananyev, Konstantin <konstantin.ananyev at intel.com>; Gobriel, Sameh <sameh.gobriel at intel.com>; Richardson, Bruce <bruce.richardson at intel.com>; Suanming Mou <suanmingm at mellanox.com>; Olivier Matz <olivier.matz at 6wind.com>; Xueming(Steven) Li <xuemingl at mellanox.com>; Andrew Rybchenko <arybchenko at solarflare.com>; Asaf Penso <asafp at mellanox.com>; Ori Kam <orika at mellanox.com>
Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table

26/03/2020 18:28, Medvedkin, Vladimir:
> Hi Yipeng, Stephen, all,
> 
> On 17/03/2020 19:52, Wang, Yipeng1 wrote:
> > From: Stephen Hemminger <stephen at networkplumber.org>
> >> On Mon, 16 Mar 2020 18:27:40 +0000
> >> "Medvedkin, Vladimir" <vladimir.medvedkin at intel.com> wrote:
> >>
> >>> Hi Morten,
> >>>
> >>>
> >>> On 16/03/2020 14:39, Morten Brørup wrote:
> >>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir 
> >>>>> Medvedkin
> >>>>> Sent: Monday, March 16, 2020 2:38 PM
> >>>>>
> >>>>> Currently DPDK has a special implementation of a hash table for
> >>>>> 4 byte keys which is called FBK hash. Unfortunately its main 
> >>>>> drawback is that it only supports 2 byte values.
> >>>>> The new implementation called DWK (double word key) hash 
> >>>>> supports 8 byte values, which is enough to store a pointer.
> >>>>>
> >>>>> It would also be nice to get feedback on whether to leave the 
> >>>>> old FBK and new DWK implementations, or whether to deprecate the 
> >>>>> old
> >> one?
> >>>> <rant on>
> >>>>
> >>>> Who comes up with these names?!?
> >>>>
> >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
> >> the same. Could you use 32 somewhere in the name instead, like in 
> >> int32_t, instead of using a growing list of creative synonyms for the same thing?
> >> Pretty please, with a cherry on top!
> >>>
> >>> That's true, at first I named it as fbk2, but then it was decided 
> >>> to rename it "dwk", so that there was no confusion with the 
> >>> existing FBK library. Naming suggestions are welcome!
> >>>
> >>>> And if the value size is fixed too, perhaps the name should also 
> >>>> indicate
> >> the value size.
> >>>> <rant off>
> >>>>
> >>>> It's a shame we don't have C++ class templates available in DPDK...
> >>>>
> >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool"
> >> library [1] to conserve memory by using uintXX_t instead of 
> >> pointers, so perhaps a variant of a 32 bit key hash library with 32 
> >> bit values (in addition to
> >> 16 bit values in FBK and 64 bit in DWK) would be nice combination 
> >> with that library.
> >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html

Yes some work is in progress to propose a new memory allocator for small objects of fixed size with small memory overhead.


> >> Why is this different (or better) than existing rte_hash.
> >> Having more flavors is not necessarily a good thing (except in 
> >> Gelato)
> >   [Wang, Yipeng]
> > Hi, Vladimir,
> > As Stephen mentioned, I think it is good idea to explain the benefit 
> > of this new type of hash table more explicitly such as Specific use cases, differences with current rte_hash, and performance numbers, etc.
> 
> The main reason for this new hash library is performance. As I 
> mentioned earlier, the current rte_fbk implementation is pretty fast 
> but it has a number of drawbacks such as 2 byte values and limited 
> collision resolving capabilities. On the other hand, rte_hash (cuckoo 
> hash) doesn't have this drawbacks but at the cost of lower performance 
> comparing to rte_fbk.
> 
> If I understand correctly, performance penalty are due to :
> 
> 1. Load two buckets
> 
> 2. First compare signatures
> 
> 3. If signature comparison hits get a key index and find memory 
> location with a key itself and get the key
> 
> 4. Using indirect call to memcmp() to compare two uint32_t.
> 
> The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks 
> while offers the same performance as rte_fbk.
> 
> Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 
> byte key size. Replacing it with a new implementation gives about 30% 
> in performance.
> 
> The main disadvantage comparing to rte_hash is some performance 
> degradation with high average table utilization due to chain resolving 
> for 5th and subsequent collision.

Thanks for explaining.
Please, such information should added in the documentation:
	doc/guides/prog_guide/hash_lib.rst


I'm going to submit v2 this week, will add documentation update.



More information about the dev mailing list