|SUCCESS| [7/7] net/netvsc: handle VF recovery events for service reset
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Wed May 6 03:53:22 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/163609
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-06
commit db3c8d4e2f1a ("net/netvsc: retry VF hotplug indefinitely")
Author: Long Li <longli at microsoft.com>
net/netvsc: retry VF hotplug indefinitely until PCI device disappears
The commit replaces a fixed retry limit with indefinite retries for VF
hotplug. Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index b8880edb4c..61e5aa464d 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -89,8 +89,8 @@ struct netvsc_mp_param {
> #define NETVSC_ARG_TXBREAK "tx_copybreak"
> #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable"
>
> -/* The max number of retry when hot adding a VF device */
> -#define NETVSC_MAX_HOTADD_RETRY 10
> +/* Retry interval for hot-add VF device (microseconds) */
> +#define NETVSC_HOTADD_RETRY_INTERVAL 1000000
[ ... ]
> @@ -622,19 +622,32 @@ static void netvsc_hotplug_retry(void *args)
> PMD_DRV_LOG(DEBUG, "%s: retry count %d",
> __func__, hot_ctx->eal_hot_plug_retry);
>
> - if (hot_ctx->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) {
> - PMD_DRV_LOG(NOTICE, "Failed to parse PCI device retry=%d",
> - hot_ctx->eal_hot_plug_retry);
> + hot_ctx->eal_hot_plug_retry++;
> +
> + /* Check if PCI device still exists -- if it disappeared, give up.
> + * Otherwise keep retrying until the net directory appears
> + * (MANA driver probe can take >100s after PCI rescan).
> + */
> + snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s", d->name);
> + if (access(buf, F_OK) != 0) {
> + PMD_DRV_LOG(NOTICE,
> + "PCI device %s no longer exists, giving up after %d retries",
> + d->name, hot_ctx->eal_hot_plug_retry);
> goto free_hotadd_ctx;
> }
Does this code handle the case where access() returns -1 with errno
other than ENOENT? For instance, on permission failures (EACCES),
does the retry logic behave correctly, or does it stop prematurely?
> snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
> di = opendir(buf);
> if (!di) {
> - PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
> - "retrying in 1 second", __func__, buf);
> - /* The device is still being initialized, retry after 1 second */
> - rte_eal_alarm_set(1000000, netvsc_hotplug_retry, hot_ctx);
> + if (hot_ctx->eal_hot_plug_retry % 30 == 0)
> + PMD_DRV_LOG(NOTICE,
> + "%s: waiting for %s (retry %d, %ds elapsed)",
> + __func__, buf, hot_ctx->eal_hot_plug_retry,
> + hot_ctx->eal_hot_plug_retry);
Does hot_ctx->eal_hot_plug_retry overflow if the net directory never
appears and the device never disappears? In such a scenario, does the
indefinite retry keep incrementing the counter until it wraps around?
[ ... ]
commit a8c3f9b1e2c7 ("net/netvsc: retry on SIOCGIFHWADDR failure")
Author: Long Li <longli at microsoft.com>
net/netvsc: retry on SIOCGIFHWADDR failure during VF hotplug
The commit reschedules retry when SIOCGIFHWADDR fails instead of giving
up. Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 61e5aa464d..8bb2df3c19 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -667,10 +667,17 @@ static void netvsc_hotplug_retry(void *args)
> ret = ioctl(s, SIOCGIFHWADDR, &req);
> close(s);
> if (ret == -1) {
> - PMD_DRV_LOG(ERR,
> - "Failed to send SIOCGIFHWADDR for device %s",
> + /* Interface may be renamed by udev (e.g. eth1 - ens1),
> + * retry instead of giving up.
> + */
> + PMD_DRV_LOG(NOTICE,
> + "Failed to send SIOCGIFHWADDR for device %s, "
> + "interface may be renaming, retrying",
> dir->d_name);
> - break;
> + closedir(di);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry, hot_ctx);
> + return;
> }
Does closing di and scheduling a retry leave any resources or states
inconsistent if another callback or event arrives before the retry?
[ ... ]
commit b1d0f4e3c9a8 ("net/netvsc: retry full probe when IB not ready")
Author: Long Li <longli at microsoft.com>
net/netvsc: retry full probe when IB device not ready during hotplug
The commit restarts the full retry loop on -ENODEV from hotplug_add.
Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 8bb2df3c19..5d1ef10eff 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -709,17 +709,33 @@ static void netvsc_hotplug_retry(void *args)
> * parent device, restore its args.
> */
> ret = rte_eal_hotplug_add(d->bus->name, d->name, drv_str ? drv_str : "");
> - if (ret) {
> - PMD_DRV_LOG(ERR,
> - "Failed to add PCI device %s",
> + free(drv_str);
> +
> + if (ret == -ENODEV) {
> + /* IB device not ready yet (mana_ib not probed).
> + * Restart the full retry from PCI device check
> + * so we re-verify the device and get fresh
> + * interface names after any renames.
> + */
> + PMD_DRV_LOG(NOTICE,
> + "IB device not ready for %s, "
> + "restarting probe in 1 second",
> d->name);
> + closedir(di);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry,
> + hot_ctx);
> + return;
> }
After closing di and scheduling a retry, does this code return without
removing hot_ctx from the hotadd_list? Does this lead to a situation
where the same hot_ctx is retried multiple times with stale directory
handles or other resources?
[ ... ]
commit c9e2a8f1d4b2 ("net/netvsc: add NOTICE-level debug logging")
Author: Long Li <longli at microsoft.com>
net/netvsc: add NOTICE-level debug logging for VF hotplug retry
The commit adds NOTICE-level logs to diagnose VF re-attach failures.
Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 5d1ef10eff..124af4f1a1 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -656,6 +656,11 @@ static void netvsc_hotplug_retry(void *args)
> if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
> continue;
>
> + PMD_DRV_LOG(NOTICE,
> + "%s: checking interface %s in %s (retry %d)",
> + __func__, dir->d_name, buf,
> + hot_ctx->eal_hot_plug_retry);
Does emitting a NOTICE-level log for every directory entry on every
retry create log flooding in production environments? Would DEBUG
level be more appropriate for per-entry logging?
[ ... ]
commit f8a7c3d2e5b4 ("net/netvsc: retry when no matching MAC found")
Author: Long Li <longli at microsoft.com>
net/netvsc: retry when no matching MAC found in net directory
The commit adds retry logic when no matching MAC is found after scanning
the net directory. Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
> index 124af4f1a1..8d9d6bbe8b 100644
> --- a/drivers/net/netvsc/hn_ethdev.c
> +++ b/drivers/net/netvsc/hn_ethdev.c
> @@ -92,6 +92,12 @@ struct netvsc_mp_param {
> /* Retry interval for hot-add VF device (microseconds) */
> #define NETVSC_HOTADD_RETRY_INTERVAL 1000000
>
> +/* Max retries when net/ directory exists but no matching MAC found.
> + * On multi-NIC PCI devices, a second VF may register later.
> + * 120 retries = ~2 minutes.
> + */
> +#define NETVSC_MAX_MAC_RETRY 120
[ ... ]
> @@ -763,6 +769,34 @@ static void netvsc_hotplug_retry(void *args)
> }
> }
>
> + /* If we opened the net directory but didn't find a matching MAC,
> + * the VF interface may not have appeared yet (e.g. on a multi-NIC
> + * PCI device, the second VF registers later). Retry.
> + */
> + if (di) {
> + closedir(di);
> + di = NULL;
> + if (!dir) {
> + /* readdir returned NULL -- loop ended without match */
> + hot_ctx->mac_retry++;
> + if (hot_ctx->mac_retry < NETVSC_MAX_MAC_RETRY) {
> + PMD_DRV_LOG(NOTICE,
> + "%s: no matching MAC found in %s, "
> + "retrying in 1 second (mac_retry %d/%d)",
> + __func__, buf,
> + hot_ctx->mac_retry,
> + NETVSC_MAX_MAC_RETRY);
> + rte_eal_alarm_set(NETVSC_HOTADD_RETRY_INTERVAL,
> + netvsc_hotplug_retry,
> + hot_ctx);
> + return;
> + }
Does this code handle the case where dir is NULL not because the loop
ended without a match, but because readdir() failed (e.g. due to an
error)? In other words, does readdir() set errno when it returns NULL
on error vs. end of directory, and is that distinction checked?
[ ... ]
> diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
> index ef55dee28e..574b909c82 100644
> --- a/drivers/net/netvsc/hn_var.h
> +++ b/drivers/net/netvsc/hn_var.h
> @@ -127,6 +127,7 @@ struct hv_hotadd_context {
> struct hn_data *hv;
> struct rte_devargs da;
> int eal_hot_plug_retry;
> + int mac_retry;
> };
Does the mac_retry counter get reset to zero when the retry logic
restarts from the PCI device check after an -ENODEV from hotplug_add?
If not, does the accumulated mac_retry count from previous iterations
cause the NETVSC_MAX_MAC_RETRY limit to be exceeded prematurely?
[ ... ]
commit a9f3c8e2d5b6 ("net/netvsc: forward per-queue stats from VF")
Author: Long Li <longli at microsoft.com>
net/netvsc: forward per-queue stats from VF device
The commit directly calls the VF device's stats_get op with qstats.
Below is the analysis.
> diff --git a/drivers/net/netvsc/hn_vf.c b/drivers/net/netvsc/hn_vf.c
> index 1fcc65a712..497f747aab 100644
> --- a/drivers/net/netvsc/hn_vf.c
> +++ b/drivers/net/netvsc/hn_vf.c
> @@ -749,7 +749,7 @@ void hn_vf_rx_queue_release(struct hn_data *hv, uint16_t queue_id)
>
> int hn_vf_stats_get(struct rte_eth_dev *dev,
> struct rte_eth_stats *stats,
> - struct eth_queue_stats *qstats __rte_unused)
> + struct eth_queue_stats *qstats)
> {
> struct hn_data *hv = dev->data->dev_private;
> struct rte_eth_dev *vf_dev;
> @@ -757,8 +757,12 @@ int hn_vf_stats_get(struct rte_eth_dev *dev,
>
> rte_rwlock_read_lock(&hv->vf_lock);
> vf_dev = hn_get_vf_dev(hv);
> - if (vf_dev)
> - ret = rte_eth_stats
More information about the test-report
mailing list