[dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex
Yuanhan Liu
yliu at fridaylinux.org
Thu Jan 18 15:03:30 CET 2018
Hi,
Apologize for late review.
On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzhike at jd.com>
>
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.
Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.
> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
> ->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
> changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
> such as vhost_user_read_cb() would try to retrieve the conn_mutex.
>
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.
I then would change the title to "vhost: fix deadlock".
I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.
> Signed-off-by: zhike wang <wangzhike at jd.com>
Again, you need fix your git config file about your name.
> ---
> lib/librte_vhost/socket.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
> struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>
> if (!strcmp(vsocket->path, path)) {
> + int del_fds[MAX_FDS];
> + int num_of_fds = 0, fd_index;
> +
I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".
> if (vsocket->is_server) {
> fdset_del(&vhost_user.fdset, vsocket->socket_fd);
> close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
> vhost_user_remove_reconnect(vsocket);
> }
>
> + /* fdset_del() must be called without conn_mutex. */
> + pthread_mutex_lock(&vsocket->conn_mutex);
> + for (conn = TAILQ_FIRST(&vsocket->conn_list);
> + conn != NULL;
> + conn = next) {
> + next = TAILQ_NEXT(conn, next);
> +
> + del_fds[num_of_fds++] = conn->connfd;
> + }
> + pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> + for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> + fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
> pthread_mutex_lock(&vsocket->conn_mutex);
> for (conn = TAILQ_FIRST(&vsocket->conn_list);
> conn != NULL;
> conn = next) {
> next = TAILQ_NEXT(conn, next);
>
> - fdset_del(&vhost_user.fdset, conn->connfd);
If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.
--yliu
> RTE_LOG(INFO, VHOST_CONFIG,
> "free connfd = %d for device '%s'\n",
> conn->connfd, path);
> --
> 1.8.3.1
More information about the dev
mailing list