|SUCCESS| [v3, 7/7] net/netvsc: handle VF recovery events for service reset
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Fri May 15 22:44:55 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164016
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-15
Reviewing series.patch for DPDK coding standards compliance.
commit 552762-2-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: retry VF hotplug indefinitely until PCI device disappears
This patch replaces the fixed retry limit with indefinite retry during
VF hot-add, terminating only when the PCI device itself disappears.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index b8880edb4c..85e500c178 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -89,8 +89,8 @@ struct netvsc_mp_param {
[ ... ]
> @@ -622,19 +622,39 @@ static void netvsc_hotplug_retry(void *args)
> PMD_DRV_LOG(DEBUG, "%s: retry count %d",
> __func__, hot_ctx->eal_hot_plug_retry);
>
> - if (hot_ctx->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) {
> - PMD_DRV_LOG(NOTICE, "Failed to parse PCI device retry=%d",
> - hot_ctx->eal_hot_plug_retry);
> + hot_ctx->eal_hot_plug_retry++;
> +
> + /* Check if PCI device still exists -- if it disappeared, give up.
> + * Otherwise keep retrying indefinitely until the net directory
> + * appears. This is safe because:
> + * - MANA driver probe can take >100s after PCI rescan
> + * - The retry uses rte_eal_alarm callbacks serialized on the
> + * EAL interrupt thread, preventing races with device close
> + * - Device close cancels pending alarms and frees the context
> + * - If the PCI device is removed, the access() check below
> + * detects the missing sysfs path and stops immediately
> + */
> + snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s", d->name);
Does this code handle snprintf truncation correctly? If d->name is very
long, could the access() call below check the wrong path? Per DPDK style,
should the return value be checked for truncation?
> + if (access(buf, F_OK) != 0) {
> + PMD_DRV_LOG(NOTICE,
> + "PCI device %s no longer exists, giving up after %d retries",
> + d->name, hot_ctx->eal_hot_plug_retry);
> goto free_hotadd_ctx;
> }
>
> snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
> di = opendir(buf);
> if (!di) {
Can the error handling here distinguish between ENOENT (directory doesn't
exist yet) and other errors like EACCES or ENOMEM? Does silent retry on
EACCES make sense, or should errno be checked?
> + if (hot_ctx->eal_hot_plug_retry % 30 == 0)
> + PMD_DRV_LOG(NOTICE,
> + "%s: waiting for %s (retry %d, %ds elapsed)",
> + __func__, buf, hot_ctx->eal_hot_plug_retry,
> + hot_ctx->eal_hot_plug_retry);
The "%ds elapsed" assumes 1-second intervals. Does the division by 30
still produce useful output if NETVSC_HOTADD_RETRY_INTERVAL changes?
[ ... ]
commit 552762-3-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
This patch adds retry logic when SIOCGIFHWADDR fails due to interface
rename during the udev rename window.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 85e500c178..096489d66d 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
[ ... ]
> @@ -674,10 +674,18 @@ static void netvsc_hotplug_retry(void *args)
> ret = ioctl(s, SIOCGIFHWADDR, &req);
> close(s);
> if (ret == -1) {
> - PMD_DRV_LOG(ERR,
> - "Failed to send SIOCGIFHWADDR for device %s",
> + /* Interface may be renamed by udev (e.g. eth1 - ens1).
> + * Retry from the top -- the PCI device check above
> + * ensures we stop if the device disappears.
> + */
> + PMD_DRV_LOG(DEBUG,
> + "Failed to send SIOCGIFHWADDR for device %s, "
> + "interface may be renaming, retrying",
> dir->d_name);
> - break;
> + closedir(di);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry, hot_ctx);
> + return;
Does this code leak the socket s when returning early? The ioctl failed
and close(s) was already called, but can errno be EINTR or EAGAIN,
warranting a retry on the same socket instead of closing and reopening?
Also, is di guaranteed to be valid here (not NULL)?
> }
> if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
> continue;
[ ... ]
commit 552762-4-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: retry full probe when IB device not ready during hotplug
This patch restarts the full retry loop when rte_eal_hotplug_add returns
-ENODEV, indicating the IB device is not yet ready.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 096489d66d..9e4fc33949 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
[ ... ]
> @@ -717,17 +717,36 @@ static void netvsc_hotplug_retry(void *args)
> * parent device, restore its args.
> */
> ret = rte_eal_hotplug_add(d->bus->name, d->name, drv_str ? drv_str : "");
> - if (ret) {
> - PMD_DRV_LOG(ERR,
> - "Failed to add PCI device %s",
> + free(drv_str);
> +
> + if (ret == -ENODEV) {
> + /* IB device not ready yet (mana_ib not probed).
> + * Restart the full retry from the PCI device
> + * check so we re-verify the device exists and
> + * get fresh interface names after any renames.
> + * This retries indefinitely -- the PCI sysfs
> + * check at the top of this function ensures
> + * we stop if the device disappears.
> + */
> + PMD_DRV_LOG(NOTICE,
> + "IB device not ready for %s, "
> + "restarting probe in 1 second",
> d->name);
> + closedir(di);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry,
> + hot_ctx);
> + return;
Does this code path need to check whether closedir(di) succeeds? The
original code did not close di before freeing the context -- does this
introduce a change in ownership assumptions about di?
> }
>
> - free(drv_str);
> + if (ret && ret != -EEXIST)
> + PMD_DRV_LOG(ERR,
> + "Failed to add PCI device %s (ret=%d)",
> + d->name, ret);
Does this suppress error logging when ret == -EEXIST and ret != 0? The
condition (ret && ret != -EEXIST) is true for all non-zero values except
-EEXIST, but the original code logged all errors. Is this intentional?
[ ... ]
commit 552762-5-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: add debug logging for VF hotplug retry
This patch adds DEBUG-level logging to diagnose VF re-attach failures.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 9e4fc33949..16fb2b344d 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
[ ... ]
> @@ -687,8 +692,12 @@ static void netvsc_hotplug_retry(void *args)
> netvsc_hotplug_retry, hot_ctx);
> return;
> }
> - if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER)
> + if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) {
> + PMD_DRV_LOG(DEBUG,
> + "%s: device %s sa_family=%d not ARPHRD_ETHER, skipping",
> + __func__, dir->d_name, req.ifr_hwaddr.sa_family);
> continue;
> + }
Style: does the 100-character line limit allow wrapping the
PMD_DRV_LOG call differently, or is the current wrapping correct?
[ ... ]
commit 552762-6-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: retry when no matching MAC found in net directory
This patch adds retry logic when the readdir loop ends without finding
a matching MAC address, which can happen on multi-NIC PCI devices.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 16fb2b344d..72743872bb 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
[ ... ]
> @@ -768,8 +774,36 @@ static void netvsc_hotplug_retry(void *args)
> RTE_ETHER_ADDR_BYTES(dev->data->mac_addrs));
> }
>
> + /* If we opened the net directory but didn't find a matching MAC,
> + * the VF interface may not have appeared yet (e.g. on a multi-NIC
> + * PCI device, the second VF registers later). Retry.
> + */
> + if (di != NULL) {
> + closedir(di);
> + di = NULL;
> + if (dir == NULL) {
> + /* readdir returned NULL -- loop ended without match */
> + hot_ctx->mac_retry++;
> + if (hot_ctx->mac_retry < NETVSC_MAX_MAC_RETRY) {
> + PMD_DRV_LOG(DEBUG,
> + "%s: no matching MAC found in %s, "
> + "retrying in 1 second (mac_retry %d/%d)",
> + __func__, buf,
> + hot_ctx->mac_retry,
> + NETVSC_MAX_MAC_RETRY);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry,
> + hot_ctx);
> + return;
Does this code path leave buf uninitialized? The snprintf calls to
populate buf occur before this check, but does the control flow guarantee
buf is valid if di != NULL?
> + }
> + PMD_DRV_LOG(NOTICE,
> + "%s: no matching MAC found after %d retries, giving up",
> + __func__, hot_ctx->mac_retry);
> + }
> + }
> +
> free_hotadd_ctx:
> - if (di)
> + if (di != NULL)
> closedir(di);
Style: the explicit NULL comparison is correct per DPDK style, but was
the original implicit comparison (if (di)) also acceptable? The patch
changes this without explaining why.
[ ... ]
> diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
> index ef55dee28e..574b909c82 100644
> --- a/drivers/net/netvsc/hn_var.h
> +++ b/drivers/net/netvsc/hn_var.h
> @@ -127,6 +127,7 @@ struct hv_hotadd_context {
> struct hn_data *hv;
> struct rte_devargs da;
> int eal_hot_plug_retry;
> + int mac_retry;
> };
Does the new mac_retry field need to be initialized to zero somewhere,
or is the context already zeroed on allocation?
[ ... ]
commit 552762-7-longli at microsoft.com
Author: Long Li <longli at microsoft.com>
Subject: net/netvsc: forward per-queue stats from VF device
This patch forwards per-queue stats from the VF device instead of
ignoring the qstats parameter.
> diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
> index 1fcc65a712..c83cc973fd 100644
> --- a/drivers/net/netvsc/hn_vf.c
> +++ b/drivers/net/netvsc/hn_vf.c
[ ... ]
> @@ -757,8 +757,19 @@ int hn_vf_stats_get(struct rte_eth_dev *dev,
>
> rte_rwlock_read_lock(&hv->vf_lock);
> vf_dev = hn_get_vf_dev(hv);
> - if (vf_dev)
> - ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
> + if (vf_dev) {
> + /* Call dev_ops->stats_get directly instead of the public
> +
More information about the test-report
mailing list