[dpdk-dev] [PATCH v3 7/7] doc: add membership documentation

De Lara Guarch, Pablo pablo.de.lara.guarch at intel.com
Mon Sep 25 14:30:00 CEST 2017



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yipeng Wang
> Sent: Wednesday, September 6, 2017 1:00 AM
> To: dev at dpdk.org
> Cc: Tai, Charlie <charlie.tai at intel.com>; Gobriel, Sameh
> <sameh.gobriel at intel.com>; Wang, Ren <ren.wang at intel.com>; Wang,
> Yipeng1 <yipeng1.wang at intel.com>; Mcnamara, John
> <john.mcnamara at intel.com>
> Subject: [dpdk-dev] [PATCH v3 7/7] doc: add membership documentation
> 
> This patch adds the documentation for membership library.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang at intel.com>
> Reviewed-by: John McNamara <john.mcnamara at intel.com>

...

> diff --git a/doc/guides/prog_guide/member_lib.rst
> b/doc/guides/prog_guide/member_lib.rst
> new file mode 100644
> index 0000000..2b32535
> --- /dev/null
> +++ b/doc/guides/prog_guide/member_lib.rst


> +Membership Library
> +==================
> +
> +Introduction
> +------------
> +
> +The DPDK Membership Library provides an API for DPDK applications to
> insert a
> +new member, delete an existing member, or query the existence of a
> member in a
> +given set, or a group of sets. For the case of a group of sets the library

"group of sets, the library".

...

> +We are including a configurable Membership Library in DPDK to cover set

This looks like more like a cover letter than part of programmer's guide.
Reword this loke "This Membership Library covers set membership...",
and avoid "include... in DPDK", as it is implicit.

> +membership functionality for both a single set and multi-set scenarios.
> The
> +library is optimized to support the customer network applications which
> require
> +membership checking functionality. In this guide, we will cover two set-
> summary
> +schemes including a vector of Bloom Filters and Hash-Table based
> +set-summary schemes with and without false negative probability,
> followed by
> +a brief discussion of the Membership Library API.
> +
> +Vector of Bloom Filters
> +-----------------------
> +
> +Bloom Filter (BF) [Member-bloom] is a well-known space-efficient
> +probabilistic data structure that answers set membership queries (test
> whether
> +an element is a member of a set) with some probability of false positives
> and
> +zero false negatives; a query for an element returns either it is "possibly in
> +a set" (with very high probability) or "definitely not in a set".
> +
> +The BF is a method for representing a set of ``n`` elements (for example
> flow keys
> +in network applications domain) to support membership queries. The idea
> of BF is
> +to allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0.
> Then
> +it chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with
> hash values range from
> +``1`` to ``m`` to perform hashing calculations on each element. 

From "0" to "m-1"?

...

> +
> +  Vector Bloom Filter (vBF) Overview
> +
> +To support membership test for both multiple sets and a single set,
> +the library implements a Vector Bloom Filter (vBF) scheme.
> +vBF basically composes multiple bloom filters into a vector of bloom filers.
> +The membership test is conducted on all of the
> +bloom filters concurrently to determine which set(s) it belongs to or none
> of
> +them. The basic idea of vBF is shown in the above figure where an element
> is
> +used to address multiple bloom filters concurrently and the bloom filter
> +index(es) with a hit is returned.
> +
> +.. _figure_membership5:
> +.. figure:: img/member_i5.*
> +
> +  vBF for Flow Scheduling to Worker Thread
> +
> +As previously mentioned, there are many usages of such structures. vBF is
> used
> +for applications that needs to check membership against multiple sets
> +simultaneously. 

"needs" -> "need".

...

> +The major usage of HTSS with false negative is to use it as a cache for
> +distributing elements to different target sets. By allowing HTSS to evict old
> +elements, the set-summary can keep track of the most recent elements
> +(i.e. active) as a cache typically does. Old inactive elements (infrequently
> +used elements) will automatically and eventually get evicted from the
> +set-summary. It worth noting that the set-summary still has false positive

"It is worth".

...

> +We also pass two seeds: ``prim_hash_seed`` and
> +``sec_hash_seed`` for the primary and secondary hash functions to
> calculate two independent hash values.
> +``socket_id`` parameter is the NUMA socket ID for the memory used to
> create the
> +set-summary. For HTSS, another parameter ``iscache`` is used to indicate

In order to keep consistency, I would use "is_cache".

> +if this set-summary is a cache (i.e. with false negative probability) or not.
> +For vBF, extra parameters are needed. For example, ``num_set`` is the
> number of
> +sets needed to initialize the vector bloom filters. This number is equal to
> the
> +number of bloom filters will be created.
> +``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate
> will be used to determine
> +the number of hash functions and the bloom filter size.
> +
> +
> +Set-summary Element Insertion
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``rte_member_add()`` function is use to insert an element/key into a
> set-summary structure. If it fails an
> +error is returned. For success the returned value is dpendednt on the
> +set-summary mode to provide extra information for the users. For vBF
> +mode, a return value of 0 means a successful insert. For HTSS mode
> without false negative, the insert
> +could fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache
> mode),
> +for insert that does not cause any eviction (i.e. no overwriting happens to
> an
> +existing entry) the return value is 0. For insertion that causes eviction, the
> return
> +value is 1 to indicate such situation, but it is not an error.
> +
> +The input arguments for the function should include the ``key`` which is a
> pointer to the element/key that needs to
> +be added to the set-summary, and ``set_id`` which is the set id associated
> +with the key that needs to be added.
> +
> +
> +Set-summary Element Lookup
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Remove unneeded last "~" (I see 4 extra at the end).

> +
> +The ``rte_member_lookup()`` function looks up a single key/element in
> the set-summary structure. It
> +returns as soon as the first match is found. The return value is 1 if a
> +match is found and 0 otherwise. The arguments for the function include
> ``key`` which is a pointer to the
> +element/key that needs to be looked up, and ``set_id`` which is used to
> return the
> +first target set id where the key has matched, if any.
> +
> +The ``rte_member_lookup_bulk()`` function is used to look up a bulk of
> keys/elements in the
> +set-summary structure for their first match. Each key lookup returns as
> soon as the first match is found. The
> +return value is the number of keys that find a match. The arguments of the
> +function include ``keys`` which is a pointer to a bulk of keys that are to be
> looked up,
> +``num_keys`` is the number
> +of keys that will be looked up, and ``set_ids`` are the return target set
> +ids for the first match found for each of the input keys. ``set_ids`` is an
> array
> +needs to be sized according to the ``num_keys``. If there is no match, the
> set id
> +for that key will be set to RTE_MEMBER_NO_MATCH.
> +
> +The ``rte_member_lookup_multi()`` function looks up a single
> key/element in the
> +set-summary structure for multiple matches. It
> +returns ALL the matches (possibly more than one) found for this key when
> it
> +is matched against all target sets (it worth noting that for cache mode
> HTSS,

"it is worth"

> +the current implementation matches at most one target set). The return
> value is
> +the number of matches
> +that was found for this key (for cache mode HTSS the return value
> +should be at most 1). The arguments for the function include ``key`` which
> is a pointer to the
> +element/key that needs to be looked up, ``match_per_key`` which is to

Better to use "max_match_per_key". Will comment more in other patches.

> indicate maximum number of matches
> +the user expect to find for each key, and ``set_id`` which is used to return
> all

"user expects to".

> +target set ids where the key has matched, if any. The ``set_id`` array
> should be sized
> +according to ``match_per_key``. For vBF, maximum number of matches
> per key is equal

"For vBF, the maximum number".

> +to the number of sets. For HTSS, maximum number of matches per key is
> equal to two times
> +entry count per bucket. ``match_per_key`` should be equal or smaller than

"two time the entry count".

> maximum number of
> +possible matches.
> +
> +The ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of
> keys/elements elements in the
> +set-summary structure for multiple matches, each key lookup returns ALL
> the matches (possibly more
> +than one) found for this key when it is matched against all target sets
> (cache mode HTSS
> +matches at most one target set). The
> +return value is the number of keys that find one or more matches in the
> +set-summary structure. The arguments of the
> +function include ``keys`` which is
> +a pointer to a bulk of keys that are to be looked up, ``num_keys`` is the
> number
> +of keys that will be looked up, ``match_per_key`` is the possible
> +max number of matches for each key, ``match_count`` which is the

Use "maximum number", no need to abbreviate here.

> returned number
> +of matches for each key, and ``set_ids`` are the returned target set
> +ids for all matches found for each keys. ``set_ids`` is 2-D array
> +that for each key, a 1-D array should be sized according to
> ``match_per_key``.

I would reword it a bit, to improve readability. For instance: 
"set_ids" is a 2-D array, containing a 1-D array per key,
which should be sized according to "match_per_key".

> +``match_per_key`` should be equal or smaller than maximum number of
> +possible matches, similar to ``rte_member_lookup_multi``.
> +
> +
> +Set-summary Element Delete
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``rte_membership_delete()`` function deletes an element/key from a
> set-summary structure, if it fails
> +an error is returned. The input arguments should include ``key`` which is a
> pointer to the
> +element/key that needs to be deleted from the set-summary, and
> ``set_id``
> +which is the set id associated with the key to delete. It worth noting that
> current

"It is worth".

> +implementation of vBF does not support deletion [1]_. An error code ``-
> EINVAL`` will be returned.



More information about the dev mailing list