[dpdk-dev] [PATCH dpdk-dev v3 2/2] mempool: use shared memzone for rte_mempool_ops
Andrew Rybchenko
arybchenko at solarflare.com
Thu Apr 23 15:38:44 CEST 2020
On 4/13/20 5:21 PM, xiangxia.m.yue at gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue at gmail.com>
>
> The order of mempool initiation affects mempool index in the
> rte_mempool_ops_table. For example, when building APPs with:
>
> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
>
> The "bucket" mempool will be registered firstly, and its index
> in table is 0 while the index of "ring" mempool is 1. DPDK
> uses the mk/rte.app.mk to build APPs, and others, for example,
> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> The mempool lib linked in dpdk and Open vSwitch is different.
>
> The mempool can be used between primary and secondary process,
> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> ring which index in table is 0, but the index of "bucket" ring
> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> mempool ops and malloc memory from mempool. The crash will occur:
>
> bucket_dequeue (access null and crash)
> rte_mempool_get_ops (should get "ring_mp_mc",
> but get "bucket" mempool)
> rte_mempool_ops_dequeue_bulk
> ...
> rte_pktmbuf_alloc
> rte_pktmbuf_copy
> pdump_copy
> pdump_rx
> rte_eth_rx_burst
>
> To avoid the crash, there are some solution:
> * constructor priority: Different mempool uses different
> priority in RTE_INIT, but it's not easy to maintain.
>
> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> driver in future, we must make sure the order.
>
> * register mempool orderly: Sort the mempool when registering,
> so the lib linked will not affect the index in mempool table.
> but the number of mempool libraries may be different.
>
> * shared memzone: The primary process allocates a struct in
> shared memory named memzone, When we register a mempool ops,
> we first get a name and id from the shared struct: with the lock held,
> lookup for the registered name and return its index, else
> get the last id and copy the name in the struct.
>
> Previous discussion: https://mails.dpdk.org/archives/dev/2020-March/159354.html
>
> Suggested-by: Olivier Matz <olivier.matz at 6wind.com>
> Suggested-by: Jerin Jacob <jerinj at marvell.com>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
> ---
> v2:
> * fix checkpatch warning
> ---
> lib/librte_mempool/rte_mempool.h | 28 +++++++++++-
> lib/librte_mempool/rte_mempool_ops.c | 89 ++++++++++++++++++++++++++++--------
> 2 files changed, 96 insertions(+), 21 deletions(-)
>
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index c90cf31467b2..2709b9e1d51b 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -50,6 +50,7 @@
> #include <rte_ring.h>
> #include <rte_memcpy.h>
> #include <rte_common.h>
> +#include <rte_init.h>
>
> #ifdef __cplusplus
> extern "C" {
> @@ -678,7 +679,6 @@ struct rte_mempool_ops {
> */
> struct rte_mempool_ops_table {
> rte_spinlock_t sl; /**< Spinlock for add/delete. */
> - uint32_t num_ops; /**< Number of used ops structs in the table. */
> /**
> * Storage for all possible ops structs.
> */
> @@ -910,6 +910,30 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> */
> int rte_mempool_register_ops(const struct rte_mempool_ops *ops);
>
> +struct rte_mempool_shared_ops {
> + size_t num_mempool_ops;
Is there any specific reason to change type from uint32_t used
above to size_t? I think that uint32_t is better here since
it is just a number, not a size of memory or related value.
> + struct {
> + char name[RTE_MEMPOOL_OPS_NAMESIZE];
> + } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> +
> + rte_spinlock_t mempool;
> +};
> +
> +static inline int
> +mempool_ops_register_cb(const void *arg)
> +{
> + const struct rte_mempool_ops *h = (const struct rte_mempool_ops *)arg;
> +
> + return rte_mempool_register_ops(h);
> +}
> +
> +static inline void
> +mempool_ops_register(const struct rte_mempool_ops *ops)
> +{
> + rte_init_register(mempool_ops_register_cb, (const void *)ops,
> + RTE_INIT_PRE);
> +}
> +
> /**
> * Macro to statically register the ops of a mempool handler.
> * Note that the rte_mempool_register_ops fails silently here when
> @@ -918,7 +942,7 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> #define MEMPOOL_REGISTER_OPS(ops) \
> RTE_INIT(mp_hdlr_init_##ops) \
> { \
> - rte_mempool_register_ops(&ops); \
> + mempool_ops_register(&ops); \
> }
>
> /**
> diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
> index 22c5251eb068..b10fda662db6 100644
> --- a/lib/librte_mempool/rte_mempool_ops.c
> +++ b/lib/librte_mempool/rte_mempool_ops.c
> @@ -14,43 +14,95 @@
> /* indirect jump table to support external memory pools. */
> struct rte_mempool_ops_table rte_mempool_ops_table = {
> .sl = RTE_SPINLOCK_INITIALIZER,
> - .num_ops = 0
> };
>
> -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> -int
> -rte_mempool_register_ops(const struct rte_mempool_ops *h)
> +static int
> +rte_mempool_register_shared_ops(const char *name)
> {
> - struct rte_mempool_ops *ops;
> - int16_t ops_index;
> + static bool mempool_shared_ops_inited;
> + struct rte_mempool_shared_ops *shared_ops;
> + const struct rte_memzone *mz;
> + uint32_t ops_index = 0;
> +
I think we should sanity check 'name' here to be not
empty string (see review notes below).
> + if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
> + !mempool_shared_ops_inited) {
> +
> + mz = rte_memzone_reserve("mempool_ops_shared",
> + sizeof(*shared_ops), 0, 0);
> + if (mz == NULL)
> + return -ENOMEM;
> +
> + shared_ops = mz->addr;
> + shared_ops->num_mempool_ops = 0;
> + rte_spinlock_init(&shared_ops->mempool);
> +
> + mempool_shared_ops_inited = true;
> + } else {
> + mz = rte_memzone_lookup("mempool_ops_shared");
> + if (mz == NULL)
> + return -ENOMEM;
> +
> + shared_ops = mz->addr;
> + }
>
> - rte_spinlock_lock(&rte_mempool_ops_table.sl);
> + rte_spinlock_lock(&shared_ops->mempool);
>
> - if (rte_mempool_ops_table.num_ops >=
> - RTE_MEMPOOL_MAX_OPS_IDX) {
> - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> + if (shared_ops->num_mempool_ops >= RTE_MEMPOOL_MAX_OPS_IDX) {
> + rte_spinlock_unlock(&shared_ops->mempool);
> RTE_LOG(ERR, MEMPOOL,
> "Maximum number of mempool ops structs exceeded\n");
> return -ENOSPC;
> }
>
> + while (shared_ops->mempool_ops[ops_index].name[0]) {
Please, compare with '\0' as DPDK style guide says.
> + if (!strcmp(name, shared_ops->mempool_ops[ops_index].name)) {
> + rte_spinlock_unlock(&shared_ops->mempool);
> + return ops_index;
> + }
> +
> + ops_index++;
> + }
> +
> + strlcpy(shared_ops->mempool_ops[ops_index].name, name,
> + sizeof(shared_ops->mempool_ops[0].name));
> +
> + shared_ops->num_mempool_ops++;
> +
> + rte_spinlock_unlock(&shared_ops->mempool);
> + return ops_index;
> +}
> +
> +/* add a new ops struct in rte_mempool_ops_table, return its index. */
> +int
> +rte_mempool_register_ops(const struct rte_mempool_ops *h)
> +{
> + struct rte_mempool_ops *ops;
> + int16_t ops_index;
> +
> if (h->alloc == NULL || h->enqueue == NULL ||
> - h->dequeue == NULL || h->get_count == NULL) {
> - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> + h->dequeue == NULL || h->get_count == NULL) {
Changing formatting just makes review a bit more harder.
> RTE_LOG(ERR, MEMPOOL,
> "Missing callback while registering mempool ops\n");
> + rte_errno = EINVAL;
Why is it done in the patch? For me it looks like logically
different change if it is really required (properly motivated).
> return -EINVAL;
> }
>
> if (strlen(h->name) >= sizeof(ops->name) - 1) {
> - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> - RTE_LOG(DEBUG, EAL, "%s(): mempool_ops <%s>: name too long\n",
> - __func__, h->name);
> + RTE_LOG(ERR, MEMPOOL,
> + "The registering mempool_ops <%s>: name too long\n",
> + h->name);
Why do you change from DEBUG to ERR here? It it not
directly related to the purpose of the patch.
> rte_errno = EEXIST;
> return -EEXIST;
> }
>
> - ops_index = rte_mempool_ops_table.num_ops++;
> + ops_index = rte_mempool_register_shared_ops(h->name);
> + if (ops_index < 0) {
> + rte_errno = -ops_index;
> + return ops_index;
> + }
> +
> + rte_spinlock_lock(&rte_mempool_ops_table.sl);
> +
> ops = &rte_mempool_ops_table.ops[ops_index];
> strlcpy(ops->name, h->name, sizeof(ops->name));
> ops->alloc = h->alloc;
> @@ -165,9 +217,8 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
> if (mp->flags & MEMPOOL_F_POOL_CREATED)
> return -EEXIST;
>
> - for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
> - if (!strcmp(name,
> - rte_mempool_ops_table.ops[i].name)) {
> + for (i = 0; i < RTE_MEMPOOL_MAX_OPS_IDX; i++) {
> + if (!strcmp(name, rte_mempool_ops_table.ops[i].name)) {
Since rte_mempool_ops_table is filled in which zeros,
name string is empty by default. So, request with empty name
will match the first unused entry. I guess it is not what we
want here. I think we should handle empty string before the
loop and return -EINVAL.
> ops = &rte_mempool_ops_table.ops[i];
> break;
> }
>
More information about the dev
mailing list