[dpdk-dev] [PATCH] net/mlx5: fix memory regions release deadlock

Michael Baum michaelba at mellanox.com
Tue Feb 4 11:08:01 CET 2020


When we create memory callback list, we add cb function managing memory
regions. This function uses lock. This callback iterates over shared
device list and takes a lock of each shared device to avoid parallel
accessing to the MR list of the shared device.
When PND releases memory regions while the list is exist, the callback
function maps the MRs using the lock.
In shared device closing, when all its MRs are freed, the same lock is
taken.

The MRs freeing calls rte_free what may trigger the memory callback.
The MR freeing, wrongly, took the lock before the shared device removal
from the callback list what causes the deadlock.

In order to solve it, first we remove the share device from the list and
then release memory regions.

Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
Cc: viacheslavo at mellanox.com
Cc: stable at dpdk.org

Signed-off-by: Michael Baum <michaelba at mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f80e403..759491f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -679,12 +679,12 @@ struct mlx5_flow_id_pool *
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	if (--sh->refcnt)
 		goto exit;
-	/* Release created Memory Regions. */
-	mlx5_mr_release(sh);
 	/* Remove from memory callback device list. */
 	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
 	LIST_REMOVE(sh, mem_event_cb);
 	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
+	/* Release created Memory Regions. */
+	mlx5_mr_release(sh);
 	/* Remove context from the global device list. */
 	LIST_REMOVE(sh, next);
 	/*
-- 
1.8.3.1



More information about the dev mailing list