[dpdk-dev] [PATCH] net/vdev_netvsc: handle removal of associated pci device

Matan Azrad matan at nvidia.com
Mon Sep 7 10:09:27 CEST 2020


Hi Stephen

From: Stephen Hemminger:
> On Sun, 6 Sep 2020 12:38:18 +0000
> Matan Azrad <matan at nvidia.com> wrote:
> 
> > Hi Stephen
> >
> > From: Stephen Hemminger:
> > > The vdev_netvsc was not detecting when the associated PCI device
> > > (SRIOV) was removed. Because of that it would keep feeding the same
> > > (removed) device to failsafe PMD which would then unsuccessfully try
> > > and probe for it.
> > >
> > > Change to use a mark/sweep method to detect that PCI device was
> > > removed, and also only tell failsafe about new PCI devices.
> > > Vdev_netvsc does not have to keep stuffing the pipe with the same
> > > already existing PCI device.
> >
> > As I know, the vdev_netvsc driver doesn't call to failsafe if the PCI device is
> not detected by the readlink command(considered as removed)...
> > Am I missing something?
> 
> The original code is broken because ctx_yield is not cleared, it keeps sending
> the same value.

Looking on the code again, It looks like ctx->yield has no effect on the next pipe write,
It is just used for log.

After the PCI interface matching to the netvsc interface, the pipe write is triggered only if the readlink commands success to see the plugged-in PCI device:
readlink /sys/class/net/[iface]/device/subsystem shows "pci"
readlink /sys/class/net/[iface]/device shows the pci device ID.

So, the assumption is when the above readlink failed on the interface the device is removed(plugged-out) and the fd write will not happen.

The code will continue to retry probe again and again until success only for plugged-in pci device matched the netvsc device.

> It looks like device removal and add was never tested.

This is basic test we have to test plug-in plug-out and it passed every day in the last years.

Maybe something new and special in your setup?

> If you test removal you will see that vdev_netvsc:
>  1. Sends same PCI device repeatedly to failsafe (every alarm call)
>     This is harmless, but useless.
>  2. When device is removed, keeps doing #1


More information about the dev mailing list