[PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events

Stephen Hemminger stephen at networkplumber.org
Mon Feb 23 18:43:58 CET 2026


On Fri, 20 Feb 2026 18:45:21 -0800
longli at linux.microsoft.com wrote:

> From: Long Li <longli at microsoft.com>
> 
> Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> timing may not match the DPDK sequence of device events triggered from
> uevents from kernel.
> 
> Remove the retry logic from the code when attach to VF and rely on DPDK
> event to attach to VF. With this change, both the notifications from VSP
> and the DPDK will attempt a VF attach.
> 
> Also implement locking when checking on all VF related fields.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: stable at dpdk.org
> 
> Signed-off-by: Long Li <longli at microsoft.com>

AI review spotted related issue.

**Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the most complex patch in the series.

**What it fixes correctly:**

The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without the lock, acquired the lock, then re-checked. A VF remove could complete between the first check and the lock acquisition. The new code takes the read lock *before* any VF state checks — correct fix. The lock is properly released on both paths.

The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`.

Moving callback registration into `hn_vf_attach()` with proper rollback (via the new `hn_vf_detach()` helper) is a good structural improvement that ties callback lifetime to VF attach/detach lifecycle.

The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is correct — if the VF is being removed, the switched flag must be cleared regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.

**One potential concern (~60% confidence):**

If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` fails at the `switch_data_path:` label, the function returns an error but leaves the VF started and attached. The callers don't clean this up. This is a pre-existing design issue the patch doesn't worsen, and the hypervisor may retry, but it could confuse subsequent add/remove cycles.


More information about the stable mailing list