[PATCH 2/4] net/mlx5: fix default memzone requirements in HWS

Dariusz Sosnowski dsosnowski at nvidia.com
Thu Feb 12 10:02:12 CET 2026


On Wed, Feb 11, 2026 at 02:38:48PM +0000, Bing Zhao wrote:
> Hi,
> 
> The code LGTM.
> I am thinking that if the user wants to use HWS for up-to 1K SFs, we will still reach such limitations or if the application try to create some rings per device.
> Maybe in the log we can tell the user try to use some API to re-configure the maximal memzones as a hint. WDYT?

If such a situation occurs, I think the existing error log from EAL
sufficiently describes the root cause and possible workaround.
I attach the error log below:

    EAL: memzone_reserve_aligned_thread_unsafe():
        Number of requested memzone segments exceeds maximum 2560
    RING: Cannot reserve memory
    mlx5_net: Failed to start port 998 mlx5_core.sf.998:
        fail to configure port

> 
> > -----Original Message-----
> > From: Maayan Kashani <mkashani at nvidia.com>
> > Sent: Monday, January 12, 2026 5:25 PM
> > To: dev at dpdk.org
> > Cc: Maayan Kashani <mkashani at nvidia.com>; Raslan Darawsheh
> > <rasland at nvidia.com>; Dariusz Sosnowski <dsosnowski at nvidia.com>;
> > stable at dpdk.org; Slava Ovsiienko <viacheslavo at nvidia.com>; Bing Zhao
> > <bingz at nvidia.com>; Ori Kam <orika at nvidia.com>; Suanming Mou
> > <suanmingm at nvidia.com>; Matan Azrad <matan at nvidia.com>
> > Subject: [PATCH 2/4] net/mlx5: fix default memzone requirements in HWS
> > 
> > From: Dariusz Sosnowski <dsosnowski at nvidia.com>
> > 
> > Commit [1] has changed the default behavior of flow engine selection in
> > mlx5 PMD to accommodate for new NIC generations.
> > Whenever underlying device does not support SWS (e.g., ConnectX-9 or
> > untrusted VFs/SFs) and device does support HWS, default flow engine would
> > be HWS (dv_flow_en=2) which also supports sync flow API.
> > 
> > This behavior change had consequence in memory usage whenever SFs are
> > probed by DPDK. In default HWS configuration supporting sync flow API
> > (i.e. without calling rte_flow_configure())
> > mlx5 PMD allocated 4 rte_ring objects per port:
> > 
> > - indir_iq and indir_cq - For handling indirect action completions.
> > - flow_transfer_pending and flow_transfer_completed - For handling
> >   template table resizing.
> > 
> > This has not happened previously with SWS as default flow engine.
> > 
> > Since a dedicated memzone is allocated for each rte_ring object, this lead
> > to exhaustion of default memzone limit on setups with ~1K SFs to probe.
> > It resulted in the following error on port start:
> > 
> >     EAL: memzone_reserve_aligned_thread_unsafe():
> >         Number of requested memzone segments exceeds maximum 2560
> >     RING: Cannot reserve memory
> >     mlx5_net: Failed to start port 998 mlx5_core.sf.998:
> >         fail to configure port
> > 
> > Since template table resizing is allowed if and only if async flow API was
> > configured, 2 of the aforementioned rings are never used in the default
> > sync flow API configuration.
> > 
> > This patch removes allocation of flow_transfer_pending and
> > flow_transfer_completed rings in default sync flow API configuration of
> > mlx5 PMD to reduce memzone usage and allow DPDK probing to succeed on
> > setups with ~1K SFs to probe.
> > 
> > [1] commit d1ac7b6c64d9
> >     ("net/mlx5: update flow devargs handling for future HW")
> > 
> > Fixes: 27d171b88031 ("net/mlx5: abstract flow action and enable
> > reconfigure")
> > Cc: stable at dpdk.org
> > 
> > Signed-off-by: Dariusz Sosnowski <dsosnowski at nvidia.com>
> > ---
> >  drivers/net/mlx5/mlx5_flow_hw.c | 86 ++++++++++++++++++++++++++-------
> >  1 file changed, 68 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_flow_hw.c
> > b/drivers/net/mlx5/mlx5_flow_hw.c index 98483abc7fc..1dada2e7cef 100644
> > --- a/drivers/net/mlx5/mlx5_flow_hw.c
> > +++ b/drivers/net/mlx5/mlx5_flow_hw.c
> > @@ -4483,6 +4483,9 @@ mlx5_hw_pull_flow_transfer_comp(struct rte_eth_dev
> > *dev,
> >  	struct mlx5_priv *priv = dev->data->dev_private;
> >  	struct rte_ring *ring = priv->hw_q[queue].flow_transfer_completed;
> > 
> > +	if (ring == NULL)
> > +		return 0;
> > +
> >  	size = RTE_MIN(rte_ring_count(ring), n_res);
> >  	for (i = 0; i < size; i++) {
> >  		res[i].status = RTE_FLOW_OP_SUCCESS;
> > @@ -4714,8 +4717,9 @@ __flow_hw_push_action(struct rte_eth_dev *dev,
> >  	struct mlx5_hw_q *hw_q = &priv->hw_q[queue];
> > 
> >  	mlx5_hw_push_queue(hw_q->indir_iq, hw_q->indir_cq);
> > -	mlx5_hw_push_queue(hw_q->flow_transfer_pending,
> > -			   hw_q->flow_transfer_completed);
> > +	if (hw_q->flow_transfer_pending != NULL && hw_q-
> > >flow_transfer_completed != NULL)
> > +		mlx5_hw_push_queue(hw_q->flow_transfer_pending,
> > +				   hw_q->flow_transfer_completed);
> >  	if (!priv->shared_host) {
> >  		if (priv->hws_ctpool)
> >  			mlx5_aso_push_wqe(priv->sh,
> > @@ -11889,6 +11893,60 @@ mlx5_hwq_ring_create(uint16_t port_id, uint32_t
> > queue, uint32_t size, const char
> >  			       RING_F_SP_ENQ | RING_F_SC_DEQ |
> > RING_F_EXACT_SZ);  }
> > 
> > +static int
> > +flow_hw_queue_setup_rings(struct rte_eth_dev *dev,
> > +			  uint16_t queue,
> > +			  uint32_t queue_size,
> > +			  bool nt_mode)
> > +{
> > +	struct mlx5_priv *priv = dev->data->dev_private;
> > +
> > +	/* HWS queue info container must be already allocated. */
> > +	MLX5_ASSERT(priv->hw_q != NULL);
> > +
> > +	/* Notice ring name length is limited. */
> > +	priv->hw_q[queue].indir_cq = mlx5_hwq_ring_create
> > +		(dev->data->port_id, queue, queue_size, "indir_act_cq");
> > +	if (!priv->hw_q[queue].indir_cq) {
> > +		DRV_LOG(ERR, "port %u failed to allocate indir_act_cq ring for
> > HWS",
> > +			dev->data->port_id);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	priv->hw_q[queue].indir_iq = mlx5_hwq_ring_create
> > +		(dev->data->port_id, queue, queue_size, "indir_act_iq");
> > +	if (!priv->hw_q[queue].indir_iq) {
> > +		DRV_LOG(ERR, "port %u failed to allocate indir_act_iq ring for
> > HWS",
> > +			dev->data->port_id);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	/*
> > +	 * Sync flow API does not require rings used for table resize
> > handling,
> > +	 * because these rings are only used through async flow APIs.
> > +	 */
> > +	if (nt_mode)
> > +		return 0;
> > +
> > +	priv->hw_q[queue].flow_transfer_pending = mlx5_hwq_ring_create
> > +		(dev->data->port_id, queue, queue_size, "tx_pending");
> > +	if (!priv->hw_q[queue].flow_transfer_pending) {
> > +		DRV_LOG(ERR, "port %u failed to allocate tx_pending ring for
> > HWS",
> > +			dev->data->port_id);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	priv->hw_q[queue].flow_transfer_completed = mlx5_hwq_ring_create
> > +		(dev->data->port_id, queue, queue_size, "tx_done");
> > +	if (!priv->hw_q[queue].flow_transfer_completed) {
> > +		DRV_LOG(ERR, "port %u failed to allocate tx_done ring for
> > HWS",
> > +			dev->data->port_id);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int
> >  flow_hw_validate_attributes(const struct rte_flow_port_attr *port_attr,
> >  			    uint16_t nb_queue,
> > @@ -12057,22 +12115,8 @@ __flow_hw_configure(struct rte_eth_dev *dev,
> >  		      &priv->hw_q[i].job[_queue_attr[i]->size];
> >  		for (j = 0; j < _queue_attr[i]->size; j++)
> >  			priv->hw_q[i].job[j] = &job[j];
> > -		/* Notice ring name length is limited. */
> > -		priv->hw_q[i].indir_cq = mlx5_hwq_ring_create
> > -			(dev->data->port_id, i, _queue_attr[i]->size,
> > "indir_act_cq");
> > -		if (!priv->hw_q[i].indir_cq)
> > -			goto err;
> > -		priv->hw_q[i].indir_iq = mlx5_hwq_ring_create
> > -			(dev->data->port_id, i, _queue_attr[i]->size,
> > "indir_act_iq");
> > -		if (!priv->hw_q[i].indir_iq)
> > -			goto err;
> > -		priv->hw_q[i].flow_transfer_pending = mlx5_hwq_ring_create
> > -			(dev->data->port_id, i, _queue_attr[i]->size,
> > "tx_pending");
> > -		if (!priv->hw_q[i].flow_transfer_pending)
> > -			goto err;
> > -		priv->hw_q[i].flow_transfer_completed = mlx5_hwq_ring_create
> > -			(dev->data->port_id, i, _queue_attr[i]->size,
> > "tx_done");
> > -		if (!priv->hw_q[i].flow_transfer_completed)
> > +
> > +		if (flow_hw_queue_setup_rings(dev, i, _queue_attr[i]->size,
> > nt_mode)
> > +< 0)
> >  			goto err;
> >  	}
> >  	dr_ctx_attr.pd = priv->sh->cdev->pd;
> > @@ -15440,6 +15484,12 @@ flow_hw_update_resized(struct rte_eth_dev *dev,
> > uint32_t queue,
> >  	};
> > 
> >  	MLX5_ASSERT(hw_flow->flags &
> > MLX5_FLOW_HW_FLOW_FLAG_MATCHER_SELECTOR);
> > +	/*
> > +	 * Update resized can be called only through async flow API.
> > +	 * These rings are allocated if and only if async flow API was
> > configured.
> > +	 */
> > +	MLX5_ASSERT(priv->hw_q[queue].flow_transfer_completed != NULL);
> > +	MLX5_ASSERT(priv->hw_q[queue].flow_transfer_pending != NULL);
> >  	/**
> >  	 * mlx5dr_matcher_resize_rule_move() accepts original table matcher
> > -
> >  	 * the one that was used BEFORE table resize.
> > --
> > 2.21.0
> 
> Acked-by: Bing Zhao <bingz at nvidia.com>
> 


More information about the stable mailing list