[dpdk-dev] [PATCH v2 3/4] vhost: avoid deadlock on async register

Hu, Jiayu jiayu.hu at intel.com
Thu Apr 15 03:08:50 CEST 2021



> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> Sent: Wednesday, April 14, 2021 6:09 PM
> To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
> Cc: Xia, Chenbo <chenbo.xia at intel.com>; Wang, Yinan
> <yinan.wang at intel.com>; Pai G, Sunil <sunil.pai.g at intel.com>; Jiang, Cheng1
> <cheng1.jiang at intel.com>
> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register
> 
> 
> 
> On 4/14/21 3:40 AM, Hu, Jiayu wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> >> Sent: Tuesday, April 13, 2021 7:33 PM
> >> To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
> >> Cc: Xia, Chenbo <chenbo.xia at intel.com>; Wang, Yinan
> >> <yinan.wang at intel.com>; Pai G, Sunil <sunil.pai.g at intel.com>; Jiang,
> Cheng1
> >> <cheng1.jiang at intel.com>
> >> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register
> >>
> >>
> >>
> >> On 4/2/21 3:04 PM, Jiayu Hu wrote:
> >>> Users can register async copy device in vring_state_changed(),
> >>> when vhost queue is enabled. However, a deadlock occurs inside
> >>> rte_vhost_async_channel_register(), if
> >> VHOST_USER_F_PROTOCOL_FEATURES
> >>> is not supported, as vhost_user_msg_handler() takes vq->access_lock
> >>> before calling vhost_user_set_vring_kick().
> >>>
> >>> This patch avoids async register deadlock by removing calling
> >>> vring_state_changed() in vhost_user_set_vring_kick(). It's safe
> >>> as vhost_user_msg_handler() will call vring_state_changed() anyway.
> >>>
> >>> Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> >>> ---
> >>>  lib/librte_vhost/vhost_user.c | 3 ---
> >>>  1 file changed, 3 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >>> index 44c0452..8f0eba6 100644
> >>> --- a/lib/librte_vhost/vhost_user.c
> >>> +++ b/lib/librte_vhost/vhost_user.c
> >>> @@ -1918,9 +1918,6 @@ vhost_user_set_vring_kick(struct virtio_net
> >> **pdev, struct VhostUserMsg *msg,
> >>>  	 */
> >>>  	if (!(dev->features & (1ULL <<
> >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> >>>  		vq->enabled = true;
> >>> -		if (dev->notify_ops->vring_state_changed)
> >>> -			dev->notify_ops->vring_state_changed(
> >>> -				dev->vid, file.index, 1);
> >>>  	}
> >>>
> >>>  	if (vq->ready) {
> >>>
> >>
> >> As replied earlier on v1, I agree the call to vring_state_changed here
> >> is not needed. But it might not be enough, there are other cases where
> >> you could have issues.
> >
> > vhost_user_notify_queue_state() can be called in three cases:
> > 1. when vq ready status changes, vhost_user_msg_handler() calls it to
> notify
> > backend. But vhost_user_msg_handler() doesn't take lock before calling it.
> > So in this case, no deadlock occurs in async register.
> >
> > 2. if vq->ready is true, vhost_user_set_vring_call() calls it to notify backend
> > vq is not enabled. Although vhost_user_set_vring_call() is protected by lock,
> > async register is called only if vq is enabled, so async register will not be
> called
> > in this case.
> >
> > 3. If vq->ready is true, vhost_user_set_vring_kick() calls it to notify backend
> > vq is not enabled. Same as #2, async register is called only when vq is
> enabled.
> > Even if vhost_user_set_vring_kick() is protected by lock, there is no
> deadlock in
> > async register, as it will not be called in this case.
> >
> > In summary,  I think there is no deadlock issue in async register if we
> > can remove calling vring_state_change() in vhost_user_set_vring_kick().
> 
> 
> But unregister one could be called in theory no? Otherwise it would look
> unbalanced. At least on disabled notification, the app should make sure
> the DMA transfers to and from the vring are stopped before it returns
> from the callabck. Otherwise it could lead to undefined behavior.

Right, users need to call unregister, but we cannot remove calling
vhost_user_notify_queue_state() in case #2 and #3, IMHO. So to
avoid deadlock, we recommended users to call async unregister in
destroy_device(), instead of on vring disabled notification. Does it
make sense to you?

> 
> >>
> >> Please add stable and Fixes tag.
> >
> > Do you suggest to make the patch as a fix for 8639d54563a
> > ("vhost: introduce async enqueue registration API")? But the
> > thing is that code removed in this patch is not introduced
> > by this commit.
> 
> The commit you need to point to is the one introducing the
> .vring_state_changed() call.

So this patch is still a fix for deadlock on async register? Or it is
a fix for unnecessary .vring_state_changed() call?

Thanks,
Jiayu

> 
> > Thanks,
> > Jiayu
> >>
> >> Thanks,
> >> Maxime
> >



More information about the dev mailing list