[PATCH] common/mlx5: skip ROCE disable for auxiliary SF devices

Dariusz Sosnowski dsosnowski at nvidia.com
Thu Feb 5 15:59:31 CET 2026


Thank you for reporting the issue and the patch.
Please see comments below.

On Sat, Jan 10, 2026 at 11:15:10PM +0000, Max Tottenham wrote:
> When probing an SF as a vDPA device, mlx5_roce_disable() targets the
> parent PF address (via mlx5_dev_to_pci_str). This incorrectly attempts
> to disable ROCE on the parent PF rather than the SF itself.
> 
> This causes vDPA probe failures when the parent PF already has an open
> IB context (e.g., probed for uplink ports or SF representors).
> 
> For SubFunctions, ROCE is configured via devlink parameters
> (enable_roce) before device creation. Skip the runtime ROCE disable
> for auxiliary devices since the devlink configuration is already in
> effect and targeting the parent PF is incorrect.
> 
> Signed-off-by: Max Tottenham <mtottenh at akamai.com>
> ---
>  drivers/common/mlx5/linux/mlx5_common_os.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
> index 2867e21618..6fa12e06eb 100644
> --- a/drivers/common/mlx5/linux/mlx5_common_os.c
> +++ b/drivers/common/mlx5/linux/mlx5_common_os.c
> @@ -690,6 +690,19 @@ mlx5_roce_disable(const struct rte_device *dev)
>  {
>  	char pci_addr[PCI_PRI_STR_SIZE] = { 0 };
>  
> +	/*
> +	 * For auxiliary devices (SFs), ROCE is configured via devlink
> +	 * parameters (enable_roce) before device creation. Skip runtime
> +	 * ROCE disable since mlx5_dev_to_pci_str() returns the parent PF
> +	 * address, not the SF - disabling ROCE on the parent PF is both
> +	 * incorrect and may fail if the PF already has an active IB context.
> +	 */
> +	if (!mlx5_dev_is_pci(dev)) {
> +		DRV_LOG(INFO, "Skipping ROCE disable for auxiliary device \"%s\"",
> +			dev->name);
> +		return 0;
> +	}

The logic has a bug as you mentioned, but I don't think
it would be a good idea to not disable ROCE automatically for SFs.
Especially since, IIUC, enable_roce option value is not inherited
from PF when new SF is created and probed.
It'll move more responsibility to the user regarding
port configuration.

In my opinion mlx5_roce_disable() should have split logic like so:

    if mlx5_dev_is_pci(dev)
        // for PCI devices continue as usual:
        // try disabling ROCE through netlink or sysfs
    else
        // for SFs: try disabling ROCE through netlink

This would require some adjustments in mlx5_nl_roce_disable()
and related code.
Specifically, devlink attributes should be adjusted when disabling ROCE
for auxiliary devices:

- DEVLINK_ATTR_BUS_NAME = "auxiliary"
- DEVLINK_ATTR_DEV_NAME = device name from rte_device->name

Would you be able to make the necessary changes?

> +
>  	if (mlx5_dev_to_pci_str(dev, pci_addr, sizeof(pci_addr)) < 0)
>  		return -rte_errno;
>  	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
> -- 
> 2.51.2
> 

Best regards,
Dariusz Sosnowski


More information about the dev mailing list