[dpdk-dev] [PATCH 9/9] vhost: only use vDPA config workaround if needed

Matan Azrad matan at mellanox.com
Wed Jun 17 13:04:09 CEST 2020


Hi Maxime

From: Maxime Coquelin:
> Hi Matan,
> 
> On 6/14/20 8:08 AM, Matan Azrad wrote:
> > Hi Maxime
> >
> > From: Maxime Coquelin:
> >> On 6/9/20 1:09 PM, Matan Azrad wrote:
> >>> Hi Maxime
> >>>
> >>> From: Maxime Coquelin
> >>>> Hi Matan,
> >>>>
> >>>> On 6/8/20 11:19 AM, Matan Azrad wrote:
> >>>>> Hi Maxime
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> Hi Matan,
> >>>>>>
> >>>>>> On 6/7/20 12:38 PM, Matan Azrad wrote:
> >>>>>>> Hi Maxime
> >>>>>>>
> >>>>>>> Thanks for the huge work.
> >>>>>>> Please see a suggestion inline.
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> Sent: Thursday, May 14, 2020 11:02 AM
> >>>>>>>> To: xiaolong.ye at intel.com; Shahaf Shuler
> >> <shahafs at mellanox.com>;
> >>>>>>>> Matan Azrad <matan at mellanox.com>; amorenoz at redhat.com;
> >>>>>>>> xiao.w.wang at intel.com; Slava Ovsiienko
> >>>> <viacheslavo at mellanox.com>;
> >>>>>>>> dev at dpdk.org
> >>>>>>>> Cc: jasowang at redhat.com; lulu at redhat.com; Maxime Coquelin
> >>>>>>>> <maxime.coquelin at redhat.com>
> >>>>>>>> Subject: [PATCH 9/9] vhost: only use vDPA config workaround if
> >>>>>>>> needed
> >>>>>>>>
> >>>>>>>> Now that we have Virtio device status support, let's only use
> >>>>>>>> the vDPA workaround if it is not supported.
> >>>>>>>>
> >>>>>>>> This patch also document why Virtio device status protocol
> >>>>>>>> feature support is strongly advised.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Maxime Coquelin
> <maxime.coquelin at redhat.com>
> >>>>>>>> ---
> >>>>>>>>  lib/librte_vhost/vhost_user.c | 16 ++++++++++++++--
> >>>>>>>>  1 file changed, 14 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c
> >>>>>>>> b/lib/librte_vhost/vhost_user.c index e5a44be58d..67e96a872a
> >>>>>>>> 100644
> >>>>>>>> --- a/lib/librte_vhost/vhost_user.c
> >>>>>>>> +++ b/lib/librte_vhost/vhost_user.c
> >>>>>>>> @@ -2847,8 +2847,20 @@ vhost_user_msg_handler(int vid, int
> fd)
> >>>>>>>>  	if (!vdpa_dev)
> >>>>>>>>  		goto out;
> >>>>>>>>
> >>>>>>>> -	if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
> >>>>>>>> -			request == VHOST_USER_SET_VRING_CALL)
> >> {
> >>>>>>>> +	if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> >>>>>>>> +		/*
> >>>>>>>> +		 * Workaround when Virtio device status protocol
> >>>>>>>> +		 * feature is not supported, wait for
> >> SET_VRING_CALL
> >>>>>>>> +		 * request. This is not ideal as some frontends like
> >>>>>>>> +		 * Virtio-user may not send this request, so vDPA
> >> device
> >>>>>>>> +		 * may never be configured. Virtio device status
> >> support
> >>>>>>>> +		 * on frontend side is strongly advised.
> >>>>>>>> +		 */
> >>>>>>>> +		if (!(dev->protocol_features &
> >>>>>>>> +				(1ULL <<
> >>>>>>>> VHOST_USER_PROTOCOL_F_STATUS)) &&
> >>>>>>>> +				(request !=
> >>>>>>>> VHOST_USER_SET_VRING_CALL))
> >>>>>>>> +			goto out;
> >>>>>>>> +
> >>>>>>>
> >>>>>>> When status protocol feature is not supported, in the current
> >>>>>>> code, the
> >>>>>> vDPA configuration triggering depends in:
> >>>>>>> 1. Device is ready - all the queues are configured (datapath
> >>>>>>> addresses,
> >>>>>> callfd and kickfd) .
> >>>>>>> 2. last command is callfd.
> >>>>>>>
> >>>>>>>
> >>>>>>> The code doesn't take into account that some queues may stay
> >> disabled.
> >>>>>>> Maybe the correct timing is:
> >>>>>>> 1. Device is ready - all the enabled queues are configured and
> >>>>>>> MEM table is
> >>>>>> configured.
> >>>>>>
> >>>>>> I think current virtio_is_ready() already assumes the mem table
> >>>>>> is configured, otherwise we would not have vq->desc, vq->used and
> >>>>>> vq->avail being set as it needs to be translated using the mem table.
> >>>>>>
> >>>>> Yes, but if you don't expect to check them for disabled queues you
> >>>>> need to
> >>>> check mem table to be sure it was set.
> >>>>
> >>>> Even disabled queues should be allocated/configured by the guest
> driver.
> >>> Is it by spec?
> >>
> >> Sorry, that was a misunderstanding from my side.
> >> The number of queues set by the driver using MQ_VQ_PAIRS_SET control
> >> message have to be allocated and configured by the driver:
> >>
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs
> >> .o
> >> asis-open.org%2Fvirtio%2Fvirtio%2Fv1.0%2Fcs04%2Fvirtio-v1.0-
> >> cs04.html%23x1-
> >>
> 1940001&data=02%7C01%7Cmatan%40mellanox.com%7Cbed5d361fbff
> >>
> 47ab766008d80c99cc53%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%
> >>
> 7C637273201984684513&sdata=zbBLclza39Fi5QenFtRx%2F1T29Dgj4w%2
> >> FudJ6obp5RxYo%3D&reserved=0
> >>
> >
> > Do you mean to the sentence:
> > "The driver MUST configure the virtqueues before enabling them with the
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command." ?
> >
> > Maybe I miss English understanding here but it looks like this sentence
> doesn't say if the driver should do configuration for queues that will not be
> enabled by the virtio driver (stay disabled forever).
> >
> >
> >>> We saw that windows virtio guest driver doesn't configure disabled
> >> queues.
> >>> Is it bug in windows guest?
> >>> You probably can take a look here:
> >>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> >>> th
> >>> ub.com%2Fvirtio-win%2Fkvm-guest-drivers-
> >> windows&data=02%7C01%7Cmat
> >>>
> >>
> an%40mellanox.com%7Cbed5d361fbff47ab766008d80c99cc53%7Ca652971c7d
> >> 2e4d9
> >>>
> >>
> ba6a4d149256f461b%7C0%7C0%7C637273201984684513&sdata=%2BqPf
> >> myvTw1T
> >>> RFif9woeR%2BsndUEunfR5O9EegJfilDI0%3D&reserved=0
> >>>
> >>
> >> Indeed it limits the number of queue pairs to the number of CPUs.
> >> This is done here:
> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> >> hu
> >> b.com%2Fvirtio-win%2Fkvm-guest-drivers-
> >>
> windows%2Fblob%2Fedda3f50a17015aab1450ca09e3263c7409e4001%2FNetK
> >>
> VM%2FCommon%2FParaNdis_Common.cpp%23L956&data=02%7C01%
> >>
> 7Cmatan%40mellanox.com%7Cbed5d361fbff47ab766008d80c99cc53%7Ca652
> >>
> 971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637273201984684513&s
> >>
> data=XXFIkVJWFacUMZLJwsKyoy6%2Bcqkn5f60fEC9rmMpaNI%3D&res
> >> erved=0
> >>
> >> Linux does the same by the way:
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> >>
> bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fnet%2Fvirtio_net.c
> >>
> %23L3092&data=02%7C01%7Cmatan%40mellanox.com%7Cbed5d361fbf
> >>
> f47ab766008d80c99cc53%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0
> >>
> %7C637273201984684513&sdata=ef6KJLHkkaGf5d6V8%2BI8N1WsI0Q3x
> >> X31jz2Y8oUSeNE%3D&reserved=0
> >
> > Yes, I think it makes sense.
> >
> >> We rarely face this issue because by default, the management layers
> >> usually set the number of queue pairs to the number of vCPUs if
> >> multiqueue is enabled. But the problem is real.
> >>
> >> In my opinion, the problem is more on Vhost-user spec side and/or
> >> Vhost- user backend.
> >>
> >> The DPDK backend allocates queue pairs for every time it receives a
> >> Vhost- user message setting a new queue (callfd, kickfd, enable,...
> >> see vhost_user_check_and_alloc_queue_pair()). And then
> >> virtio_is_ready() waits for all the allocated queue pairs to be initialized.
> >>
> >> The problem is that QEMU sends some if these messages even for
> queues
> >> that aren't (or won't be) initialized, as you can see in below log
> >> where I reproduced the issue with Windows 10:
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpas
> >> te
> bin.com%2FYYCfW9Y3&data=02%7C01%7Cmatan%40mellanox.com%7C
> >>
> bed5d361fbff47ab766008d80c99cc53%7Ca652971c7d2e4d9ba6a4d149256f461
> >>
> b%7C0%7C0%7C637273201984684513&sdata=MBG09v4BscpX5%2Bf%2Bl
> >> 7EOwhJcrpvoH7Wo3ISTLOxC6Lk%3D&reserved=0
> >>
> >> I don't see how the backend could know the guest driver is done with
> >> currently received information from Qemu as it seems to him some
> >> queues are partially initialized (callfd is set).
> >
> > Don’t you think that only enabled queues must be fully initialized when
> their status is changed from disabled to enabled?
> > So, you can assume that disabled queues can stay "not fully initialized"...
> 
> That may work but might not be following the Virtio spec as with 1.0 we
> shouldn't process the rings before DRIVER_OK is set (but we cannot be sure
> we follow it anyway without SET_STATUS support).
> 
> I propose to cook a patch doing the following:
> 1. virtio_is_ready() will only ensure the first queue pair is ready (i.e. enabled
> and configured). Meaning that app's new_device callback and vDPA drivers
> dev_conf callback will be called with only the first queue pair configured and
> enabled.
> 
> 2. Before handling a new vhost-user request, it saves the ready status for
> every queue pair.
> 
> 3. Same handling of the requests, except that we won't notify the vdpa
> driver and the application of vring state changes in the
> VHOST_USER_SET_VRING_ENABLE handler.
> 
> 4. Once the Vhost-user request is handled, it compares the new ready status
> foe every queues with the old one and send queue state event changes
> accordingly.

Looks very nice to me.

More points:
By this method some queues may be configured by the set_vring_state operation so the next calls are expected to be called for each queue by the driver from the set_vring_state callback :
1. rte_vhost_enable_guest_notification
	This one takes datapath lock so we need to be sure that datapath lock is not locked on this queue from the same caller thread (maybe to not takes datapath locks when vdpa is configured at all).
2. rte_vhost_host_notifier_ctrl
	This function API is per device and not per queue, maybe we need to change this function to be per queue (add new for now and deprecate the old one in 20.11).

3. Need to be sure that if ready queue configuration is changed after dev_conf, we should notify it to the driver. (maybe by set_vring_state(disabl) and set_vring_state(enable)).


> It is likely to need changes in the .dev_conf and .set_vring_state
> implementations by the drivers.

Yes, for Mellanox it is very easy change.
Intel?

 
> >
> >> With VHOST_USER_SET_STATUS, we will be able to handle this properly,
> >> as the backend can be sure the guest won't initialize more queues as
> >> soon as DRIVER_OK Virtio status bit is set. In my v2, I can add one
> >> patch to handle this case properly, by "destorying" queues metadata
> >> as soon as DRIVER_OK is received.
> >>
> >> Note that it was the exact reason why I first tried to add support
> >> for VHOST_USER_SET_STATUS more than two years ago...:
> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
> >> ts.g
> >> nu.org%2Farchive%2Fhtml%2Fqemu-devel%2F2018-
> >>
> 02%2Fmsg04560.html&data=02%7C01%7Cmatan%40mellanox.com%7C
> >>
> bed5d361fbff47ab766008d80c99cc53%7Ca652971c7d2e4d9ba6a4d149256f461
> >>
> b%7C0%7C0%7C637273201984684513&sdata=KGJjdEtEN54duNu41rhBIw
> >> o4tmdWn6QD4yvdR3zeLI8%3D&reserved=0
> >>
> >> What do you think?
> >
> > Yes, I agree it may be solved by VHOST_USER_SET_STATUS (and probably a
> > lot of other issues), But I think we need support also legacy QEMU versions
> if we can...
> 
> I think the SET_STATUS support is important to be compliant with the Virtio
> specifictation.
> 
> > Don't you think so?

Yes, I agree.

> 
> We can try that.
> I will try to cook something this week, but it will require validation with OVS
> to be sure we don't break multiqueue. I will send it as RFC, and count on you
> to try it with your mlx5 vDPA driver.
> 
> Does it work for you? (note I'll be on vacation from July 1st to 17th)

Sure,
Do you have capacity to do it this week?
I can help..... 

Matan


> 
> Thanks,
> Maxime
> 
> >> Regards,
> >> Maxime
> >



More information about the dev mailing list