[dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition

Maxime Coquelin maxime.coquelin at redhat.com
Tue Jun 23 16:33:57 CEST 2020



On 6/23/20 3:55 PM, Maxime Coquelin wrote:
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>
>>
>> From: Maxime Coquelin:
>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>
>>>>
>>>> From: Maxime Coquelin:
>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>
>>>>>>
>>>>>> From: Maxime Coquelin:
>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Maxime Coquelin:
>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>> To: Matan Azrad <matan at mellanox.com>; Xiao Wang
>>>>>>>>> <xiao.w.wang at intel.com>
>>>>>>>>> Cc: dev at dpdk.org
>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>> definition
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Maxime
>>>>>>>>>>
>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>> To: Matan Azrad <matan at mellanox.com>; Xiao Wang
>>>>>>>>>>> <xiao.w.wang at intel.com>
>>>>>>>>>>> Cc: dev at dpdk.org
>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>> definition
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>>>>> change callback should
>>>>>>>>>>> have been called.
>>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>>> and after
>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>
>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>> sometimes it will be ready before and after the message handler
>>>>>>>>>>> but
>>>>> with different values.
>>>>>>>>>>>
>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>>> virtio_net
>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>> +
>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>> +
>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>> +
>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>
>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>> message according to v1.
>>>>>>>>>>
>>>>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>>>>> doesn't
>>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>>> enabling the queue again.
>>>>>>>>>> Because of that we decided to force calling state callback again
>>>>>>>>>> when
>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
>>> the
>>>>>>> queue is
>>>>>>>>> already ready.
>>>>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>>>>> account
>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>>>>
>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>> But now looking into the details of the implementation, I find it
>>>>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>>>>
>>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>>> (queue state and
>>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>>> message
>>>>>>>>> handler:
>>>>>>>>>> Pseudo:
>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>> 	switch (msg):
>>>>>>>>>> 		case enable:
>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>> 				force queue state =1.
>>>>>>>>>> 		case callfd
>>>>>>>>>> 		case kickfd
>>>>>>>>>> 				.....
>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>> needed..
>>>>>>>>>> 		Default
>>>>>>>>>> 			Return;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>>> handled (after vring_invalidate(), before setting new FD for call
>>>>>>>>> & kick, ...)
>>>>>>>>
>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>> pair is invalidated,
>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>>>>> case of FD
>>>>>>> change.
>>>>>>>
>>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>>> the disable and so you'll have one notification for the disable and
>>>>>>> one for the enable, or it didn't sent the disable and it will
>>>>>>> happen at old value invalidation time and after new value is taken into
>>> account.
>>>>>>>
>>>>>>
>>>>>> I don't see it in current QEMU behavior.
>>>>>> When working MQ I see that some virtqs get configuration message
>>>>>> while
>>>>> they are in enabled state.
>>>>>> Then, enable message is sent again later.
>>>>>
>>>>> I guess you mean the first queue pair? And it would not be in ready
>>>>> state as it would be the initial configuration of the queue?
>>>>
>>>> Even after initialization when queue is ready.
>>>>
>>>>>>
>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>> only in one
>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>> handlers
>>>>> around?
>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>
>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>> sure this is a correct assumption:
>>>>>>>
>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>> client must support changing some configuration aspects on the fly."
>>>>>>
>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>
>>>>> I agree it lacks a bit of clarity.
>>>>>
>>>>>> As I mentioned, QEMU sends enable message always after configuration
>>>>> message.
>>>>>
>>>>> Yes, but we should not do assumptions on current Qemu version when
>>>>> possible. Better to be safe and follow the specification, it will be more
>>> robust.
>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>
>>>> I understand your point here but do you really want to be ready for any
>>> configuration update in run time?
>>>> What does it mean? How datatpath should handle configuration from
>>> control thread in run time while traffic is on?
>>>> For example, changing queue size \ addresses must stop traffic before...
>>>> Also changing FDs is very sensitive.
>>>>
>>>> It doesn't make sense to me.
>>>>
>>>> Also, according to "on the fly" direction we should not disable the queue
>>> unless enable message is coming to disable it.
>>
>> No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so
> we are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues so
> we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
>>>> In addition:
>>>> Do you really want to toggle vDPA drivers\app for any configuration
>>> message? It may cause queue recreation for each one (at least for mlx5).
>>>
>>> I want to have something robust and maintainable.
>>
>> Me too.
>>
>>> These messages arriving after a queue have been configured once are rare
>>> events, but this is usually the kind of things that cause maintenance burden.
>>
>> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
> 
> Right.
> 
>>> If you look at my example patch, you will understand that with my proposal,
>>> there won't be any more state change notification than with your proposal
>>> when Qemu or any other Vhost-user master send a disable request before
>>> sending the request that impact the queue state.
>>
>> we didn't talk about disable time - this one is very simple.
>>
>> Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
>> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user
> master (in case the master asks for this synchronization, like
> set_mem_table for instance when reply-ack is enabled).
> 
>>> It just adds more robustness if this unlikely event happens, by invalidating
>>> the ring state to not ready before doing the actual ring configuration change.
>>> So that this config change is not missed by the vDPA driver or the application.
>>
>> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).
> 

Sorry, I forgot to reply here.
I am not sure about what do you mean about my invalidate proposal for
set_vring_base?

> 
>> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue
> pair is ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the
> first queue pair receive a disable request or, with my patch, when one
> of the rings receives a request that changes the ring state?
> 
>>
>> I will not insist anymore on waiting for enable for notifying although I not fan with it.
>>
>> So, I suggest to create 1 notification function to be called after message handler and before reply.
>> This function is the only one which notify ready states in the next options:
>>
>> 1. virtq ready state is changed in the queue.
>> 2. virtq ready state stays on after configuration message handler.
>> 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is
> that correct? If so, I think it does not work with memory hotplug case I
> mentioned earlier.
> 
> Even for the callfd double change it can be problematic as Vhost-lib
> will close the first one while it will still be used by the driver (Btw,
> I see my example patch is also buggy in this regards, it should reset
> the call_fd value in the virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).
> 
> Thanks,
> Maxime
>>
>> Matan
>>
>>
>>
>>> Maxime
>>
> 



More information about the dev mailing list