[dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature

Bruce Richardson bruce.richardson at intel.com
Thu Sep 27 14:27:56 CEST 2018


On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, September 27, 2018 12:15 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> > Cc: Wang, Yipeng1 <yipeng1.wang at intel.com>; dev at dpdk.org; michel at digirati.com.br
> > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> > 
> > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yipeng Wang <yipeng1.wang at intel.com>
> > > > Sent: Friday, September 21, 2018 12:18 PM
> > > > To: bruce.richardson at intel.com
> > > > Cc: dev at dpdk.org; yipeng1.wang at intel.com; michel at digirati.com.br;
> > > > Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> > > >
> > > > In use cases that hash table capacity needs to be guaranteed, the extendable
> > > > bucket feature can be used to contain extra keys in linked lists when conflict
> > > > happens. This is similar concept to the extendable bucket hash table in packet
> > > > framework.
> > > >
> > > > This commit adds the extendable bucket feature. User can turn it on or off
> > > > through the extra flag field during table creation time.
> > > >
> > > > Extendable bucket table composes of buckets that can be linked list to current
> > > > main table. When extendable bucket is enabled, the table utilization can
> > > > always acheive 100%.
> > > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that
> > all of the configured number of entries will be accommodated?
> > >
> > > > Although keys ending up in the ext buckets may have longer look up time, they
> > > > should be rare due to the cuckoo algorithm.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang at intel.com>
> > > > ---
> > > >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > > > +++++++++++++++++++++++++++++++++-----
> > > >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> > > >  lib/librte_hash/rte_hash.h        |   3 +
> > > >  3 files changed, 292 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > > index f7b86c8..616900b 100644
> > > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > > @@ -31,6 +31,10 @@
> > > >  #include "rte_hash.h"
> > > >  #include "rte_cuckoo_hash.h"
> > > >
> > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > > \
> > > > +for (CURRENT_BKT = START_BUCKET;                                      \
> > > > +CURRENT_BKT != NULL;                                          \
> > > > +CURRENT_BKT = CURRENT_BKT->next)
> > > >
> > > >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > > >
> > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> > > >  return h;
> > > >  }
> > > >
> > > > +static inline struct rte_hash_bucket *
> > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > > > +while (lst_bkt->next != NULL)
> > > > +lst_bkt = lst_bkt->next;
> > > > +return lst_bkt;
> > > > +}
> > > > +
> > > >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
> > > >  h->cmp_jump_table_idx = KEY_CUSTOM;
> > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  struct rte_tailq_entry *te = NULL;
> > > >  struct rte_hash_list *hash_list;
> > > >  struct rte_ring *r = NULL;
> > > > +struct rte_ring *r_ext = NULL;
> > > >  char hash_name[RTE_HASH_NAMESIZE];
> > > >  void *k = NULL;
> > > >  void *buckets = NULL;
> > > > +void *buckets_ext = NULL;
> > > >  char ring_name[RTE_RING_NAMESIZE];
> > > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > > >  unsigned num_key_slots;
> > > >  unsigned i;
> > > >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > > +unsigned int ext_table_support = 0;
> > > >  unsigned int readwrite_concur_support = 0;
> > > >
> > > >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  multi_writer_support = 1;
> > > >  }
> > > >
> > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > > +ext_table_support = 1;
> > > > +
> > > >  /* Store all keys and leave the first entry as a dummy entry for
> > > > lookup_bulk */
> > > >  if (multi_writer_support)
> > > >  /*
> > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  goto err;
> > > >  }
> > > >
> > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > > +RTE_HASH_BUCKET_ENTRIES;
> > > > +
> > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > > +params-
> > > > >name);
> > > Can be inside the if statement below.
> > >
> > > > +/* Create ring for extendable buckets. */
> > > > +if (ext_table_support) {
> > > > +r_ext = rte_ring_create(ext_ring_name,
> > > > +rte_align32pow2(num_buckets + 1),
> > > > +params->socket_id, 0);
> > > > +
> > > > +if (r_ext == NULL) {
> > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > > "
> > > > +"failed\n");
> > > > +goto err;
> > > > +}
> > > > +}
> > > > +
> > > >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > > >
> > > >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  goto err_unlock;
> > > >  }
> > > >
> > > > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > > > -/ RTE_HASH_BUCKET_ENTRIES;
> > > > -
> > > >  buckets = rte_zmalloc_socket(NULL,
> > > >  num_buckets * sizeof(struct rte_hash_bucket),
> > > >  RTE_CACHE_LINE_SIZE, params->socket_id);
> > > >
> > > >  if (buckets == NULL) {
> > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > > >  goto err_unlock;
> > > >  }
> > > >
> > > > +/* Allocate same number of extendable buckets */
> > > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is
> > a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly
> > increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table
> > contains very few entries.
> > >
> > > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation
> > dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than
> > allocating all the resources during initialization.
> > >
> > 
> > Given that adding new entries should not normally be a fast-path function,
> 
> Umm, I don't think I agree with that.
> There are plenty of cases where add/delete speed is important.
> Konstantin

True, I suppose.
Perhaps then the best approach is to give a couple of options to the
developer. Allow specifying an initial amount of memory, and then an option
to allow the memory to grow or not. If the cost of memory allocation is a
problem for a specific app, then they can provide a large default and not
allow growing, while other apps, for whom speed is not that critical, can
provide a small default and allow growing.

Does that seem reasonable?

/Bruce


More information about the dev mailing list