[EXTERNAL] Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF add/remove events
Long Li
longli at microsoft.com
Mon Feb 23 23:31:44 CET 2026
> Subject: [EXTERNAL] Re: [PATCH v2 2/8] net/netvsc: fix race conditions on VF
> add/remove events
>
> On Fri, 20 Feb 2026 18:45:21 -0800
> longli at linux.microsoft.com wrote:
>
> > From: Long Li <longli at microsoft.com>
> >
> > Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> > timing may not match the DPDK sequence of device events triggered from
> > uevents from kernel.
> >
> > Remove the retry logic from the code when attach to VF and rely on
> > DPDK event to attach to VF. With this change, both the notifications
> > from VSP and the DPDK will attempt a VF attach.
> >
> > Also implement locking when checking on all VF related fields.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Long Li <longli at microsoft.com>
>
> AI review spotted related issue.
>
> **Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the
> most complex patch in the series.
>
> **What it fixes correctly:**
>
> The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without
> the lock, acquired the lock, then re-checked. A VF remove could complete
> between the first check and the lock acquisition. The new code takes the read
> lock *before* any VF state checks — correct fix. The lock is properly released on
> both paths.
>
> The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix,
> since it modifies `vf_attached` and calls `rte_eth_dev_close()`.
>
> Moving callback registration into `hn_vf_attach()` with proper rollback (via the
> new `hn_vf_detach()` helper) is a good structural improvement that ties callback
> lifetime to VF attach/detach lifecycle.
>
> The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is
> correct — if the VF is being removed, the switched flag must be cleared
> regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.
>
> **One potential concern (~60% confidence):**
>
> If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)`
> fails at the `switch_data_path:` label, the function returns an error but leaves the
> VF started and attached. The callers don't clean this up. This is a pre-existing
> design issue the patch doesn't worsen, and the hypervisor may retry, but it could
> confuse subsequent add/remove cycles.
I think it's safe to leave this code as is. Although VF is started and attached, netvsc will never use the VF since vf_vsc_switched is not set. This is similar to what netvsc does in the kernel.
More information about the stable
mailing list