[dpdk-dev] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache

Feifei Wang feifei.wang2 at arm.com
Thu Mar 18 08:18:39 CET 2021


'dev_gen' is a variable to inform other cores to flush their local cache
when global cache is rebuilt.

However, if 'dev_gen' is updated after global cache is rebuilt, other
cores may load a wrong memory region lkey value from old local cache.

Timeslot        main core               worker core
  1         rebuild global cache
  2                                  load unchanged dev_gen
  3            update dev_gen
  4                                  look up old local cache

>From the example above, we can see that though global cache is rebuilt,
due to that dev_gen is not updated, the worker core may look up old
cache table and receive a wrong memory region lkey value.

To fix this, updating 'dev_gen' should be moved before rebuilding global
cache to inform worker cores to flush their local cache when global
cache start rebuilding. And wmb can ensure the sequence of this process.

Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")
Cc: stable at dpdk.org

Suggested-by: Ruifeng Wang <ruifeng.wang at arm.com>
Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
---
 drivers/net/mlx5/mlx5_mr.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index da4e91fc2..7ce1d3e64 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -103,20 +103,18 @@ mlx5_mr_mem_event_free_cb(struct mlx5_dev_ctx_shared *sh,
 		rebuild = 1;
 	}
 	if (rebuild) {
-		mlx5_mr_rebuild_cache(&sh->share_cache);
+		++sh->share_cache.dev_gen;
+		DEBUG("broadcasting local cache flush, gen=%d",
+			sh->share_cache.dev_gen);
+
 		/*
 		 * Flush local caches by propagating invalidation across cores.
-		 * rte_smp_wmb() is enough to synchronize this event. If one of
-		 * freed memsegs is seen by other core, that means the memseg
-		 * has been allocated by allocator, which will come after this
-		 * free call. Therefore, this store instruction (incrementing
-		 * generation below) will be guaranteed to be seen by other core
-		 * before the core sees the newly allocated memory.
+		 * rte_smp_wmb() is to keep the order that dev_gen updated before
+		 * rebuilding global cache. Therefore, other core can flush their
+		 * local cache on time.
 		 */
-		++sh->share_cache.dev_gen;
-		DEBUG("broadcasting local cache flush, gen=%d",
-		      sh->share_cache.dev_gen);
 		rte_smp_wmb();
+		mlx5_mr_rebuild_cache(&sh->share_cache);
 	}
 	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
 }
@@ -407,20 +405,19 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	mlx5_mr_free(mr, sh->share_cache.dereg_mr_cb);
 	DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
 	      (void *)mr);
-	mlx5_mr_rebuild_cache(&sh->share_cache);
+
+	++sh->share_cache.dev_gen;
+	DEBUG("broadcasting local cache flush, gen=%d",
+		sh->share_cache.dev_gen);
+
 	/*
 	 * Flush local caches by propagating invalidation across cores.
-	 * rte_smp_wmb() is enough to synchronize this event. If one of
-	 * freed memsegs is seen by other core, that means the memseg
-	 * has been allocated by allocator, which will come after this
-	 * free call. Therefore, this store instruction (incrementing
-	 * generation below) will be guaranteed to be seen by other core
-	 * before the core sees the newly allocated memory.
+	 * rte_smp_wmb() is to keep the order that dev_gen updated before
+	 * rebuilding global cache. Therefore, other core can flush their
+	 * local cache on time.
 	 */
-	++sh->share_cache.dev_gen;
-	DEBUG("broadcasting local cache flush, gen=%d",
-	      sh->share_cache.dev_gen);
 	rte_smp_wmb();
+	mlx5_mr_rebuild_cache(&sh->share_cache);
 	rte_rwlock_read_unlock(&sh->share_cache.rwlock);
 	return 0;
 }
-- 
2.25.1



More information about the dev mailing list