[dpdk-dev] [PATCH v3 1/7] member: implement main API

De Lara Guarch, Pablo pablo.de.lara.guarch at intel.com
Mon Sep 25 16:15:03 CEST 2017



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yipeng Wang
> Sent: Wednesday, September 6, 2017 1:00 AM
> To: dev at dpdk.org
> Cc: Tai, Charlie <charlie.tai at intel.com>; Gobriel, Sameh
> <sameh.gobriel at intel.com>; Wang, Ren <ren.wang at intel.com>; Wang,
> Yipeng1 <yipeng1.wang at intel.com>; Mcnamara, John
> <john.mcnamara at intel.com>
> Subject: [dpdk-dev] [PATCH v3 1/7] member: implement main API
> 
> Membership library is an extension and generalization of a traditional filter
> (for example Bloom Filter) structure. In general, the Membership library is a
> data structure that provides a "set-summary" and responds to set-
> membership queries of whether a certain element belongs to a set(s). A
> membership test for an element will return the set this element belongs to
> or not-found if the element is never inserted into the set-summary.
> 
> The results of the membership test is not 100% accurate. Certain false

Is -> are.
> positive or false negative probability could exist. However, comparing to a
> "full-blown" complete list of elements, a "set-summary"
> is memory efficient and fast on lookup.
> 
> This patch add the main API definition.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang at intel.com>
> ---
>  lib/Makefile                             |   2 +
>  lib/librte_eal/common/eal_common_log.c   |   1 +
>  lib/librte_eal/common/include/rte_log.h  |   1 +
>  lib/librte_member/Makefile               |  49 +++
>  lib/librte_member/rte_member.c           | 342 ++++++++++++++++++++
>  lib/librte_member/rte_member.h           | 518
> +++++++++++++++++++++++++++++++
>  lib/librte_member/rte_member_version.map |  16 +
>  7 files changed, 929 insertions(+)
>  create mode 100644 lib/librte_member/Makefile  create mode 100644
> lib/librte_member/rte_member.c  create mode 100644
> lib/librte_member/rte_member.h  create mode 100644
> lib/librte_member/rte_member_version.map
> 

...

> diff --git a/lib/librte_member/rte_member.c
> b/lib/librte_member/rte_member.c new file mode 100644 index
> 0000000..71b066d
> --- /dev/null
> +++ b/lib/librte_member/rte_member.c

...

> +
> +#include <string.h>
> +
> +#include <rte_eal.h>
> +#include <rte_eal_memconfig.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +
> +#include "rte_member.h"
> +#include "rte_member_ht.h"
> +#include "rte_member_vbf.h"
> +
> +TAILQ_HEAD(rte_member_list, rte_tailq_entry); static struct
> +rte_tailq_elem rte_member_tailq = {
> +	.name = "RTE_MEMBER",
> +};
> +EAL_REGISTER_TAILQ(rte_member_tailq)
> +
> +
> +void *
> +rte_member_find_existing(const char *name) {
> +	struct rte_member_setsum *setsum = NULL;
> +	struct rte_tailq_entry *te;
> +	struct rte_member_list *member_list;
> +
> +	member_list = RTE_TAILQ_CAST(rte_member_tailq.head,
> rte_member_list);
> +
> +	rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
> +	TAILQ_FOREACH(te, member_list, next) {
> +		setsum = (struct rte_member_setsum *) te->data;
> +		if (strncmp(name, setsum->name,
> RTE_MEMBER_NAMESIZE) == 0)
> +			break;
> +	}
> +	rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> +	if (te == NULL) {
> +		rte_errno = ENOENT;
> +		return NULL;
> +	}
> +	return setsum;
> +}
> +
> +void
> +rte_member_free(void *ss)

Why not using directly "struct rte_member_setsum", so you can use it directly?
This applies to other functions that are using "void *".
I see that the content of this structure is internal, but you can still use a pointer to that structure.

...

> +void *
> +rte_member_create(const struct rte_member_parameters *params) {
> +	struct rte_tailq_entry *te;
> +	struct rte_member_list *member_list = NULL;
> +	struct rte_member_setsum *setsum = NULL;
> +	int ret;
> +
> +	if (params == NULL) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	if ((params->key_len == 0)) {
> +		rte_errno = EINVAL;
> +		RTE_LOG(ERR, MEMBER,
> +			"Memship create with invalid parameters\n");

rte_member_create has invalid parameters?


> diff --git a/lib/librte_member/rte_member.h
> b/lib/librte_member/rte_member.h new file mode 100644 index
> 0000000..de44b1b
> --- /dev/null
> +++ b/lib/librte_member/rte_member.h
> @@ -0,0 +1,518 @@

...

> +enum rte_member_setsum_type {
> +	RTE_MEMBER_TYPE_HT = 0,  /**< Hash table based set summary.
> */
> +	RTE_MEMBER_TYPE_VBF,     /**< vector of bloom filters. */

"vector" -> "Vector".

> +	RTE_MEMBER_NUM_TYPE
> +};
> +
> +/** @internal compare function for different arch. */ enum
> +rte_member_sig_compare_function {
> +	RTE_MEMBER_COMPARE_SCALAR = 0,
> +	RTE_MEMBER_COMPARE_AVX2,
> +	RTE_MEMBER_COMPARE_NUM
> +};
> +
> +/** @internal setsummary structure. */
> +struct rte_member_setsum {
> +	enum rte_member_setsum_type type;
> +	const char *name;
> +	uint32_t key_len;
> +	uint32_t socket_id;          /* NUMA Socket ID for memory. */

Either comment all the fields or do not do it (for this case, because the
structure is internal, otherwise, all fields would require a comment).
Also, remember that they should all start with capital letters.

> +	uint32_t prim_hash_seed;
> +	uint32_t sec_hash_seed;
> +
> +
> +	/* hash table based */
> +	uint32_t bucket_cnt;
> +	uint32_t bucket_mask;
> +	uint8_t cache;
> +	enum rte_member_sig_compare_function sig_cmp_fn;

...

> +struct rte_member_parameters {

...

> +	uint8_t iscache;

As said in the documentation patch, change to "is_cache".
 
> +
> +	/**
> +	 * For HT setsummary, num_keys equals to the number of entries
> of the
> +	 * table. When the number of keys that inserted to the HT
> setsummary

"number of keys inserted in the HT summary".

> +	 * approaches this number, eviction could happen. For cache mode,
> +	 * keys could be evicted out of the table. For non-cache mode, keys
> will
> +	 * be evicted to other buckets like cuckoo hash. The table will also
> +	 * likely to become full before the number of inserted keys equal to
> the
> +	 * total number of entries.
> +	 *
> +	 * For vBF, num_keys equal to the expected number of keys that
> will
> +	 * be inserted into the vBF. The implementation assumes the keys
> are
> +	 * evenly distributed to each BF in vBF. This is used to calculate the
> +	 * number of bits we need for each BF. User does not specify the
> size of
> +	 * each BF directly because the optimal size depends on the
> num_keys
> +	 * and false positive rate.
> +	 */
> +	uint32_t num_keys;
> +
> +

Leave just one blank line between fields.

> +	/**
> +	 * The length of key is used for hash calculation. Since key is not
> +	 * stored in set-summary, large key does not require more memory
> space.
> +	 */
> +	uint32_t key_len;
> +
> +
> +	/**
> +	 * num_set is only relevant to vBF based setsummary.
> +	 * num_set is equal to the number of BFs in vBF. For current
> +	 * implementation, it only supports 1,2,4,8,16,32 BFs in one vBF set
> +	 * summary. If other number of sets are needed, for example 5, the
> user
> +	 * should allocate the minimum available value that larger than 5,
> +	 * which is 8.
> +	 */
> +	uint32_t num_set;
> +
> +	/**
> +	 * false_postive_rate is only relevant to vBF based setsummary.
> +	 * false_postivie_rate is the user-defined false positive rate
> +	 * given expected number of inserted keys (num_keys). It is used to
> +	 * calculate the total number of bits for each BF, and the number of
> +	 * hash values used during lookup and insertion. For details please
> +	 * refer to vBF implementation and membership library
> documentation.
> +	 *
> +	 * HT setsummary's false positive rate is in the order of:
> +	 * false_pos = (1/bucket_count)*(1/2^16), since we use 16-bit
> signature.
> +	 * This is because two keys needs to map to same bucket and same
> +	 * signature to have a collision (false positive). bucket_count is
> equal
> +	 * to number of entries (num_keys) divided by entry count per
> bucket
> +	 * (RTE_MEMBER_BUCKET_ENTRIES). Thus, the false_postivie_rate
> is not
> +	 * directly set by users.
> +	 */

Typo in "positive" in several places in the comment above.
Also, clarify that "false_positive_rate" is not directly set by users for HT mode.

> +	float false_positive_rate;

...

> + * Lookup key in set-summary (SS).
> + * Single key lookup and return as soon as the first match found

Leave a blank line before parameters.

> + * @param setsum
> + *   pointer of a setsummary
> + * @param key
> + *   pointer of the key that needs to lookup

Better something like this "Pointer of the key to be looked up"

> + * @param set_id
> + *   output the set id matches the key
> + * @return
> + *   return 1 for found a match and 0 for not found a match
> + */
> +

Definitions of the fields should start with capital letter.

> +int
> +rte_member_lookup(const void *setsum,
> +		const void *key, MEMBER_SET_TYPE *set_id);
> +
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * lookup bulk of keys in set-summary (SS).

"Lookup bulk"

> + * Each key lookup returns as soon as the first match found
> + * @param setsum
> + *   Pointer of a setsummary
> + * @param keys
> + *   Pointer of bulk of keys that to be lookup

"Pointer of bulk of keys to be looked up". 
Check for this in other parts of the code.

> + * @param num_keys
> + *   Number of keys that will be lookup
> + * @param set_ids
> + *   Output set ids for all the keys to this array.
> + *   User should preallocate array that can contain all results, which size is
> + *   the num_keys.
> + * @return
> + *   The number of keys that found a match
> + */
> +
> +
> +int
> +rte_member_lookup_bulk(const void *setsum,
> +		const void **keys, uint32_t num_keys,
> +		MEMBER_SET_TYPE *set_ids);
> +
> +
> +
> +

Too many blank spaces.  Generally leave one between functions.
Check for this in the rest of the files.

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Lookup a bulk of keys in set-summary (SS) for multiple matches each
> key.
> + * Each key lookup will find all matched entries (multiple match).
> + * Note that for cache mode HT, each key can have at most one match. So
> + * multi-match function is mainly used for vBF and non-cache mode HT.
> + * @param setsum
> + *   pointer of a setsummary
> + * @param keys
> + *   The keys that to be lookup
> + * @param num_keys
> + *   The number of keys that will be lookup
> + * @param max_match_per_key
> + *   The possible maximum number of matches for each key
> + * @param match_count
> + *   Output the number of matches for each key in an array
> + * @param set_ids
> + *   Return set ids for all the matches of all keys. User pass in a
> preallocated

"User passes in" 

> + *   2D array with first dimension as key index and second dimension as
> match
> + *   index. For example set_ids[bulk_size][max_match_per_key]
> + * @return
> + *   The number of keys that found one or more matches in the set-
> summary
> + */
> +
> +int
> +rte_member_lookup_multi_bulk(const void *setsum,
> +		const void **keys, uint32_t num_keys,
> +		uint32_t max_match_per_key,
> +		uint32_t *match_count,
> +		MEMBER_SET_TYPE *set_ids);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Insert key into set-summary (SS).
> + *
> + * @param setsum
> + *   pointer of a set-summary

Start with capital letter.

> + * @param key
> + *   the key that needs to be added
> + * @param set_id
> + *   The set id associated with the key that needs to be added. Different
> mode
> + *   supports different set_id ranges. 0 cannot be used as set_id since
> + *   RTE_MEMBER_NO_MATCH by default is set as 0.
> + *   For HT mode, the set_id has range as [1, 0x7FFF], MSB is reserved.
> + *   For vBF mode the set id is limited by the num_set parameter when
> create
> + *   the set-summary.
> + * @return
> + *   HT (cache mode) and vBF should never fail unless the set_id is not in
> the
> + *   valid range. In such case -EINVAL is returned.
> + *   For HT (non-cache mode) it could fail with -ENOSPC error code when
> table is
> + *   full.
> + *   For success it returns different values for different modes to provide
> + *   extra information for users.
> + *   Return 0 for HT (cache mode) if the add does not cause
> + *   eviction, return 1 otherwise. Return 0 for HT mode if success, -ENOSPC

For HT non-cache mode?

> for
> + *   full, and 1 if cuckoo eviction happens. Always return 0 for vBF mode.
> + */
> +
> +int
> +rte_member_add(const void *setsum, const void *key,
> +		MEMBER_SET_TYPE set_id);
> +
> +

...

> diff --git a/lib/librte_member/rte_member_version.map
> b/lib/librte_member/rte_member_version.map
> new file mode 100644
> index 0000000..a5877c9
> --- /dev/null
> +++ b/lib/librte_member/rte_member_version.map
> @@ -0,0 +1,16 @@
> +DPDK_17.11 {
> +	global:
> +
> +	rte_member_create;
> +	rte_member_find_existing;
> +	rte_member_lookup;
> +	rte_member_lookup_bulk;
> +	rte_member_lookup_multi;
> +	rte_member_lookup_multi_bulk;
> +	rte_member_add;
> +	rte_member_free;
> +	rte_member_reset;
> +	rte_member_delete;

This list must be in alphabetical order.

> +
> +	local: *;
> +};
> --
> 2.7.4



More information about the dev mailing list