[dpdk-dev] [RFC] mlx5: fix error unwind in device start

Shahaf Shuler shahafs at mellanox.com
Mon Aug 13 09:52:47 CEST 2018


Hi Stephan,

Thursday, August 2, 2018 1:00 AM, Stephen Hemminger:
> Subject: [RFC] mlx5: fix error unwind in device start
> 
> The error handling in start of the mlx5 driver is buggy.
> For example, if setting up the flows fails the device driver will then get stuck
> in mlx5_flow_rxq_flags_clear waiting for something that will never happen.

Looking at the code I cannot understand why the mlx5_flow_rxq_flags_clear get stuck nor to what it waits.
The function has few finite loops which are not depended in anything which happened before it at the device start.

Moreover I tried to force either the mlx5_traffic_enable or the mlx5_flow_start to stop, however the results was the port failed to start but no stuck.

Can you provide more details about the issue you saw there?  

> 
> The problem is that the code jumps to a common error label and does
> unwind for portions of the driver which have not been setup.
> 
> This suggested patch breaks it into different labels with each failure path only
> unwinding what was done.
> 
> Also, the ethdev driver should not be manipulating the dev_started flag
> directly. That is handled by the common ethdev layer.
> 

I agree that maybe this code part can be better written, but my question before is whether we have an actual bug that we will solve w/ this change? 

> The patch works for the success case, but furthur testing is needed to
> actually exercise all the error paths.
> This is left as exercise for the maintainers.
> 
> Signed-off-by: Stephen Hemminger <sthemmin at microsoft.com>
> ---
>  drivers/net/mlx5/mlx5_trigger.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index e2a9bb703261..79a7b233986a
> 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -171,42 +171,42 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  	if (ret) {
>  		DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
>  			dev->data->port_id, strerror(rte_errno));
> -		mlx5_txq_stop(dev);
> -		return -rte_errno;
> +		goto error_txq_stop;
>  	}
> -	dev->data->dev_started = 1;
> +
>  	ret = mlx5_rx_intr_vec_enable(dev);
>  	if (ret) {
>  		DRV_LOG(ERR, "port %u Rx interrupt vector creation failed",
>  			dev->data->port_id);
> -		goto error;
> +		goto error_rxq_stop;
>  	}
>  	mlx5_xstats_init(dev);
>  	ret = mlx5_traffic_enable(dev);
>  	if (ret) {
>  		DRV_LOG(DEBUG, "port %u failed to set defaults flows",
>  			dev->data->port_id);
> -		goto error;
> +		goto error_intr_vec_disable;
>  	}
>  	ret = mlx5_flow_start(dev, &priv->flows);
>  	if (ret) {
>  		DRV_LOG(DEBUG, "port %u failed to set flows",
>  			dev->data->port_id);
> -		goto error;
> +		goto error_traffic_disable;
>  	}
> +
>  	dev->tx_pkt_burst = mlx5_select_tx_function(dev);
>  	dev->rx_pkt_burst = mlx5_select_rx_function(dev);
>  	mlx5_dev_interrupt_handler_install(dev);
>  	return 0;
> -error:
> -	ret = rte_errno; /* Save rte_errno before cleanup. */
> -	/* Rollback. */
> -	dev->data->dev_started = 0;
> -	mlx5_flow_stop(dev, &priv->flows);
> +
> +error_traffic_disable:
>  	mlx5_traffic_disable(dev);
> -	mlx5_txq_stop(dev);
> +error_intr_vec_disable:
> +	mlx5_rx_intr_vec_disable(dev);
> +error_rxq_stop:
>  	mlx5_rxq_stop(dev);
> -	rte_errno = ret; /* Restore rte_errno. */
> +error_txq_stop:
> +	mlx5_txq_stop(dev);
>  	return -rte_errno;
>  }
> 
> --
> 2.18.0



More information about the dev mailing list