[dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response

Min Tang tommytang at gmail.com
Thu Feb 27 19:24:49 CET 2020


That quick fix was just to verify my guess. I agree that it needs more
comprehensive fix.

Yes, race condition is another issue here. In addition to that, I think in
the function that sends the NVS_TYPE_RNDIS message, it needs to drain the
response message.
I looked at the netvsc driver in Linux kernel, it receives all the VMBus
messages anachronously in another thread. That's probably something we can
think about in the DPDK driver.


On Thu, Feb 27, 2020 at 12:47 PM Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Thu, 27 Feb 2020 11:16:01 -0500
> Min Tang <tommytang at gmail.com> wrote:
>
> > Hi Stephen:
> >
> > I saw the following error messages when using DPDK 18.11.2 in Azure:
> >
> > hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> > hn_dev_configure(): subchannel configuration failed
> >
> > It was not easy to reproduce it and it only occurred with multiple queues
> > enabled. In hn_nvs_execute it expects the response to match the request.
> In
> > the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> > got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> > message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> > looked at the code and found that the NVS_TYPE_RNDIS message needs
> > completion response but it does not receive the response message
> anywhere.
> > The fix would be receiving and discarding the wrong response message(s).
> >
> > I put the following patches and it has fixed the problem.
> >
> > --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> > +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> > @@ -92,7 +92,7 @@
> >   if (hdr->type != type) {
> >   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
> >      hdr->type, type);
> > - goto retry;
> > + return -EINVAL;
> >   }
> >
> >   if (len < resplen) {
>
> Thanks for the analysis. Not sure if this the right fix.
> Looks like the control channel needs additional locking.
> Having two outstanding requests at once is not going to work well.
>


More information about the dev mailing list