[PATCH] vhost: use try_lock in rte_vhost_vring_call

Maxime Coquelin maxime.coquelin at redhat.com
Wed Sep 21 11:41:11 CEST 2022



On 9/20/22 10:43, Liu, Changpeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Tuesday, September 20, 2022 4:13 PM
>> To: Liu, Changpeng <changpeng.liu at intel.com>; dev at dpdk.org
>> Cc: Xia, Chenbo <chenbo.xia at intel.com>
>> Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call
>>
>>
>>
>> On 9/20/22 09:45, Liu, Changpeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>> Sent: Tuesday, September 20, 2022 3:35 PM
>>>> To: Liu, Changpeng <changpeng.liu at intel.com>; dev at dpdk.org
>>>> Cc: Xia, Chenbo <chenbo.xia at intel.com>
>>>> Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call
>>>>
>>>>
>>>>
>>>> On 9/20/22 09:29, Liu, Changpeng wrote:
>>>>> Hi Maxime,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>> Sent: Tuesday, September 20, 2022 3:19 PM
>>>>>> To: Liu, Changpeng <changpeng.liu at intel.com>; dev at dpdk.org
>>>>>> Cc: Xia, Chenbo <chenbo.xia at intel.com>
>>>>>> Subject: Re: [PATCH] vhost: use try_lock in rte_vhost_vring_call
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/6/22 04:22, Changpeng Liu wrote:
>>>>>>> Note that this function is in data path, so the thread context
>>>>>>> may not same as socket messages processing context, by using
>>>>>>> try_lock here, users can have another try in case of VQ's access
>>>>>>> lock is held by `vhost-events` thread.
>>>>>>>
>>>>>>> Signed-off-by: Changpeng Liu <changpeng.liu at intel.com>
>>>>>>> ---
>>>>>>>      lib/vhost/vhost.c | 6 +++++-
>>>>>>>      1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
>>>>>>> index 60cb05a0ff..072d2acb7b 100644
>>>>>>> --- a/lib/vhost/vhost.c
>>>>>>> +++ b/lib/vhost/vhost.c
>>>>>>> @@ -1329,7 +1329,11 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
>>>>>>>      	if (!vq)
>>>>>>>      		return -1;
>>>>>>>
>>>>>>> -	rte_spinlock_lock(&vq->access_lock);
>>>>>>> +	if (!rte_spinlock_trylock(&vq->access_lock)) {
>>>>>>> +		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
>>>>>>> +			"failed to kick guest, virtqueue busy.\n");
>>>>>>> +		return -1;
>>>>>>> +	}
>>>>>>>
>>>>>>>      	if (vq_is_packed(dev))
>>>>>>>      		vhost_vring_call_packed(dev, vq);
>>>>>>
>>>>>> I think that's problematic, because it will break other applications
>>>>>> that currently rely on the API to block until the call is done.
>>>>>>
>>>>>> Just some internal DPDK usage of this API:
>>>>>> ./drivers/vdpa/ifc/ifcvf_vdpa.c:871:	rte_vhost_vring_call(internal->vid,
>>>>>> qid);
>>>>>> ./examples/vhost/virtio_net.c:236:	rte_vhost_vring_call(dev->vid,
>> queue_id);
>>>>>> ./examples/vhost/virtio_net.c:446:	rte_vhost_vring_call(dev->vid,
>> queue_id);
>>>>>> ./examples/vhost_blk/vhost_blk.c:99:
>>>>>> rte_vhost_vring_call(task->ctrlr->vid, vq->id);
>>>>>> ./examples/vhost_blk/vhost_blk.c:134:
>>>>>> rte_vhost_vring_call(task->ctrlr->vid, vq->id);
>>>>>>
>>>>>> This change will break all the above uses.
>>>>>>
>>>>>> And that's not counting external projects.
>>>>>>
>>>>>> ou should better introduce a new API that does not block.
>>>>> Could you add a new API to do this?
>>>>    >
>>>>> I think we can use the new API in SPDK as a workaround, note that SPDK
>> project
>>>> is blocked for
>>>>> a while which can't be used with DPDK 22.05 or newer.
>>>>
>>>> DPDK v22.05?
>>>> What is the commit introducing the regression?
>>> Here is the commit introducing this issue
>>> c5736998305d ("vhost: fix missing virtqueue lock protection")
>>> Bugzilla ID: 1015
>>
>> Ok, it cannot be reverted, as it prevents some undefined
>> behaviors/crashes.
>>
>>>>
>>>> Note that if we introduce a new API, it won't be backported to stable
>>>> branches.
>>> I understand, but do we have better idea in short time? we're planning
>>> to release SPDK 22.09 recently.
>>
>> You can have another thread that sends the call?
> We already use two threads to do this. Here is the example for existing code in SPDK:
> 
> DPDK vhost-events thread                        SPDK thread
> 
>      SET_VRING_KICK VQ1       ---->            Start polling VQ1
>      Reply to DPDK                    <----              Done
>      SET_VRING_KICK VQ2       ---->            thread is blocked on VQ's access lock, SPDK thread can't provide reply message
>   
> For example, we can just return for  SET_VRING_KICK VQ2 message without checking SPDK thread, but this leave
> uncertain replies to VM.

I'm sorry but you will have to find a workaround while v22.11 is out and
you can consume it. We can neither backport new API nor we can break all
the other applications not handling locking failure.

Regarding the new API for v22.11, I should be named something like
rte_vhost_vring_call_nonblock(), and ideally should return some like
-EAGAIN instead of -1 o that the applications can distinguish between a
real failure and a need for retry.

Regards,
Maxime

>>
>>>>
>>>>
>>>>> Vhost-blk and scsi devices are not same with vhost-net, we need to cover
>>>> SeaBIOS and VM
>>>>> cases, so we need to start processing vrings after 1 vring is ready.
>>>>>>
>>>>>> Regards,
>>>>>> Maxime
>>>>>
>>>
> 



More information about the dev mailing list