[dpdk-dev] [PATCH v4 1/7] member: implement main API

De Lara Guarch, Pablo pablo.de.lara.guarch at intel.com
Mon Oct 2 12:04:18 CEST 2017


Hi Yipeng,

> -----Original Message-----
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:40 PM
> To: dev at dpdk.org
> Cc: thomas at monjalon.net; Tai, Charlie <charlie.tai at intel.com>; Gobriel,
> Sameh <sameh.gobriel at intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>; Mcnamara, John
> <john.mcnamara at intel.com>; Wang, Yipeng1 <yipeng1.wang at intel.com>
> Subject: [PATCH v4 1/7] member: implement main API
> 
> Membership library is an extension and generalization of a traditional filter
> (for example Bloom Filter) structure. In general, the Membership library is a
> data structure that provides a "set-summary" and responds to set-
> membership queries of whether a certain element belongs to a set(s). A
> membership test for an element will return the set this element belongs to
> or not-found if the element is never inserted into the set-summary.
> 
> The results of the membership test are not 100% accurate. Certain false
> positive or false negative probability could exist. However, comparing to a
> "full-blown" complete list of elements, a "set-summary"
> is memory efficient and fast on lookup.
> 
> This patch adds the main API definition.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang at intel.com>

A few comments on changes that you didn't make in the v4.

Thanks,
Pablo

...

> +
> +struct rte_member_setsum *
> +rte_member_create(const struct rte_member_parameters *params) {
> +	struct rte_tailq_entry *te;
> +	struct rte_member_list *member_list;
> +	struct rte_member_setsum *setsum;
> +	int ret;
> +
> +	if (params == NULL) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	if (params->key_len == 0 ||
> +			params->prim_hash_seed == params-
> >sec_hash_seed) {
> +		rte_errno = EINVAL;
> +		RTE_MEMBER_LOG(ERR, "Memship create with invalid
> parameters\n");

Do not use "Memship". Change to " rte_member_create has invalid parameters"?
Or something else that you want, but not Memship.

> +		return NULL;
> +	}
> +

...

> +struct rte_member_parameters {
> +	const char *name;			/**< Name of the hash. */
> +
> +	/**
> +	 * User to specify the type of the setsummary from one of
> +	 * rte_member_setsum_type.
> +	 *
> +	 * HT based setsummary is implemented like a hash table. User
> should use
> +	 * this type when there are many sets.
> +	 *
> +	 * vBF setsummary is a vector of bloom filters. It is used when
> number
> +	 * of sets is not big (less than 32 for current implementation).
> +	 */
> +	enum rte_member_setsum_type type;
> +
> +	/**
> +	 * If it is HT based setsummary, user to specify the subtype or mode
> +	 * of the setsummary. It could be cache, or non-cache mode.
> +	 * Set iscache to be 1 if to use as cache mode.

Change to "is_cache".

> +	 *
> +	 * For cache mode, keys can be evicted out of the HT setsummary.
> Keys
> +	 * with the same signature and map to the same bucket
> +	 * will overwrite each other in the setsummary table.
> +	 * This mode is useful for the case that the set-summary only
> +	 * needs to keep record of the recently inserted keys. Both
> +	 * false-negative and false-positive could happen.
> +	 *
> +	 * For non-cache mode, keys cannot be evicted out of the cache. So
> for
> +	 * this mode the setsummary will become full eventually. Keys with
> the
> +	 * same signature but map to the same bucket will still occupy
> multiple
> +	 * entries. This mode does not give false-negative result.
> +	 */
> +	uint8_t is_cache;
> +
> +	/**
> +	 * For HT setsummary, num_keys equals to the number of entries
> of the
> +	 * table. When the number of keys that inserted in the HT
> setsummary

"number of keys inserted in the HT summary". Or "were inserted".

> +	 * approaches this number, eviction could happen. For cache mode,
> +	 * keys could be evicted out of the table. For non-cache mode, keys
> will

...

> +	/**
> +	 * false_positive_rate is only relevant to vBF based setsummary.
> +	 * false_positive_rate is the user-defined false positive rate
> +	 * given expected number of inserted keys (num_keys). It is used to
> +	 * calculate the total number of bits for each BF, and the number of
> +	 * hash values used during lookup and insertion. For details please
> +	 * refer to vBF implementation and membership library
> documentation.
> +	 * Note that this parameter is not directly set by users for HT mode.
> +	 *
> +	 * HT setsummary's false positive rate is in the order of:
> +	 * false_pos = (1/bucket_count)*(1/2^16), since we use 16-bit
> signature.
> +	 * This is because two keys needs to map to same bucket and same
> +	 * signature to have a collision (false positive). bucket_count is
> equal
> +	 * to number of entries (num_keys) divided by entry count per
> bucket
> +	 * (RTE_MEMBER_BUCKET_ENTRIES). Thus, the false_positive_rate
> is not
> +	 * directly set by users.

Unless I understood it incorrectly, I think you should clarify that
this field is not set for HT mode, but it is for vBF, right?

> +	 */
> +	float false_positive_rate;
> +
> +	/**
> +	 * We use two seeds to calculate two independent hashes for each
> key.
> +	 *
> +	 * For HT type, one hash is used as signature, and the other is used
> +	 * for bucket location.
> +	 * For vBF type, these two hashes and their combinations are used
> as
> +	 * hash locations to index the bit array.
> +	 */
> +	uint32_t prim_hash_seed;
> +
> +	/**
> +	 * The secondary seed should be a different value from the primary
> seed.
> +	 */
> +	uint32_t sec_hash_seed;
> +
> +	int socket_id;			/**< NUMA Socket ID for memory.
> */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Find an existing set-summary and return a pointer to it.
> + *
> + * @param name
> + *   Name of the set-summary.
> + * @return
> + *   Pointer to the set-summary or NULL if object not found
> + *   with rte_errno set appropriately. Possible rte_errno values include:
> + *    - ENOENT - value not available for return
> + */
> +struct rte_member_setsum *
> +rte_member_find_existing(const char *name);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create set-summary (SS).
> + *
> + * @param params
> + *   Parameters to initialize the setsummary.
> + * @return
> + *   Return the pointer to the setsummary.
> + *   Return value is NULL if the creation failed.
> + */
> +struct rte_member_setsum *
> +rte_member_create(const struct rte_member_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Lookup key in set-summary (SS).
> + * Single key lookup and return as soon as the first match found
> + *
> + * @param setsum
> + *   Pointer of a setsummary.
> + * @param key
> + *   Pointer of the key to be looked up.
> + * @param set_id
> + *   Output the set id matches the key.
> + * @return
> + *   Return 1 for found a match and 0 for not found a match.
> + */
> +int
> +rte_member_lookup(const struct rte_member_setsum *setsum, const
> void *key,
> +			member_set_t *set_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * lookup bulk of keys in set-summary (SS).

"Lookup bulk"

> + * Each key lookup returns as soon as the first match found
> + *
> + * @param setsum
> + *   Pointer of a setsummary.
> + * @param keys
> + *   Pointer of the bulk of keys to be looked up.
> + * @param num_keys
> + *   Number of keys that will be lookup.
> + * @param set_ids
> + *   Output set ids for all the keys to this array.
> + *   User should preallocate array that can contain all results, which size is
> + *   the num_keys.
> + * @return



More information about the dev mailing list