[PATCH v7] net/bonding: another fix to LACP mempool size
Ferruh Yigit
ferruh.yigit at xilinx.com
Fri Apr 29 16:20:03 CEST 2022
On 3/28/2022 4:16 PM, Gaoxiang Liu wrote:
> The following log message may appear after a slave is idle(or nearly
> idle)
> for a few minutes:"PMD: Failed to allocate LACP packet from pool".
> And bond mode 4 negotiation may fail.
>
> Problem:When bond mode 4 has been chosed and delicated queue has
> not been enable, all mbufs from a slave' private pool(used
> exclusively for transmitting LACPDUs) have been allocated in
> interrupt thread, and are still sitting in the device's tx
> descriptor ring and other cores' mempool caches in fwd thread.
> Thus the interrupt thread can not alloc LACP packet from pool.
>
> Solution: Ensure that each slave'tx (LACPDU) mempool owns more than
> n-tx-queues * n-tx-descriptor + fwd_core_num *
> per-core-mmempool-flush-threshold mbufs.
>
> Note that the LACP tx machine fuction is the only code that allocates
> from a slave's private pool. It runs in the context of the interrupt
> thread, and thus it has no mempool cache of its own.
>
> Signed-off-by: Gaoxiang Liu <liugaoxiang at huawei.com>
>
> ---
> v2:
> * Fixed compile issues.
>
> v3:
> * delete duplicate code.
>
> v4;
> * Fixed some issues.
> 1. total_tx_desc should use +=
> 2. add detailed logs
>
> v5:
> * Fixed some issues.
> 1. move CACHE_FLUSHTHRESH_MULTIPLIER to rte_eth_bond-8023ad.c
> 2. use RTE_MIN
>
> v6:
> * add a comment of CACHE_FLUSHTHRESH_MULTIPLIER macro
>
> v7:
> * Fixed some issues.
> 1. move CACHE_FLUSHTHRESH_MULTIPLIER to rte_mempool.h
> ---
> drivers/net/bonding/rte_eth_bond_8023ad.c | 7 ++++---
> lib/mempool/rte_mempool.h | 2 ++
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
> index ca50583d62..f7f6828126 100644
> --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
> +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
> @@ -1050,6 +1050,7 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
> uint32_t total_tx_desc;
> struct bond_tx_queue *bd_tx_q;
> uint16_t q_id;
> + uint32_t cache_size;
>
> /* Given slave mus not be in active list */
> RTE_ASSERT(find_slave_by_id(internals->active_slaves,
> @@ -1100,11 +1101,11 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
> total_tx_desc += bd_tx_q->nb_tx_desc;
> }
>
> + cache_size = RTE_MIN(RTE_MEMPOOL_CACHE_MAX_SIZE, 32);
> + total_tx_desc += rte_lcore_count() * cache_size * RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER;
> snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id);
> port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc,
> - RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
> - 32 : RTE_MEMPOOL_CACHE_MAX_SIZE,
> - 0, element_size, socket_id);
> + cache_size, 0, element_size, socket_id);
>
> /* Any memory allocation failure in initialization is critical because
> * resources can't be free, so reinitialization is impossible. */
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..fa15ed710f 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -56,6 +56,8 @@
> extern "C" {
> #endif
>
> +#define RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER 1.5
> +
This change seems already get some comments and changes in previous
versions.
I also thought why we are adding a new macro to the mempool for a
bonding driver update, but that is not the whole picture.
There is an existing 'CACHE_FLUSHTHRESH_MULTIPLIER' macro in mempool,
this patch wants to use it but that macro is not exposed.
And I can see there is other user of that macros (mlx5_rxq.c [1])
suffering from same problem.
So, what do you think having two patches,
- first one is only for mempool update, which removes the
'CACHE_FLUSHTHRESH_MULTIPLIER' from 'rte_mempool.c', adds
'RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER' to 'rte_mempool.h' as this
patch does, and update existing usage.
- second patch just updates the bonding driver and use the new
'RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER' macro
[1] @Matan, @Slava,
'mlx5_rxq.c', comment mentions that it intends to use
'CACHE_FLUSHTHRESH_MULTIPLIER' but can't access it and use a hardcoded
value (2). But the hard coded value and macro values doesn't match, is
it intentional.
When 'RTE_MEMPOOL_CACHE_FLUSHTHRESH_MULTIPLIER' is exposed in header,
can it be replaced with hard coded value in the driver?
> #define RTE_MEMPOOL_HEADER_COOKIE1 0xbadbadbadadd2e55ULL /**< Header cookie. */
> #define RTE_MEMPOOL_HEADER_COOKIE2 0xf2eef2eedadd2e55ULL /**< Header cookie. */
> #define RTE_MEMPOOL_TRAILER_COOKIE 0xadd2e55badbadbadULL /**< Trailer cookie.*/
More information about the dev
mailing list