[dpdk-dev] [PATCH V2] kni: fix rtnl deadlocks and race conditions v2

Ferruh Yigit ferruh.yigit at intel.com
Tue Feb 23 13:53:23 CET 2021

On 2/23/2021 12:05 PM, Elad Nachman wrote:
> This version 2 of the patch leverages on Stephen Hemminger's 64106
> patch from Dec 2019,
> and fixes the issues reported by Ferruh and Igor:
> A. KNI sync lock is being locked while rtnl is held.
> If two threads are calling kni_net_process_request() ,
> then the first one will take the sync lock, release rtnl lock then sleep.
> The second thread will try to lock sync lock while holding rtnl.
> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
> The remedy is to release rtnl before locking the KNI sync lock.
> Since in between nothing is accessing Linux network-wise,
> no rtnl locking is needed.
> B. There is a race condition in __dev_close_many() processing the
> close_list while the application terminates.
> It looks like if two vEth devices are terminating,
> and one releases the rtnl lock, the other takes it,
> updating the close_list in an unstable state,
> causing the close_list to become a circular linked list,
> hence list_for_each_entry() will endlessly loop inside
> __dev_close_many() .
> Since the description for the original patch indicate the
> original motivation was bringing the device up,
> I have changed kni_net_process_request() to hold the rtnl mutex
> in case of bringing the device down since this is the path called
> from __dev_close_many() , causing the corruption of the close_list.
> Depends-on: patch-64106 ("kni: fix kernel deadlock when using mlx devices")

Can you please make new version of the patches on top of latest git head, not 
exiting patches, we don't support incremental updates.

> Signed-off-by: Elad Nachman <eladv6 at gmail.com>
> ---
> V2:
> * rebuild the patch as increment from patch 64106
> * fix comment and blank lines
> ---
>   kernel/linux/kni/kni_net.c | 25 +++++++++++++++++--------
>   1 file changed, 17 insertions(+), 8 deletions(-)
> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
> index f0b6e9a8d..b41360220 100644
> --- a/kernel/linux/kni/kni_net.c
> +++ b/kernel/linux/kni/kni_net.c
> @@ -110,9 +110,22 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>   	void *resp_va;
>   	uint32_t num;
>   	int ret_val;
> +	int req_is_dev_stop = 0;
> +

One more thing, can you please add comment to code why "stop" request is 
special? You have it in the commit log, but a short description in code also cna 
be helpful.

More information about the dev mailing list