[dpdk-dev] [PATCH v3 2/2] net/mlx5: remove unnecessary wmb for Memory Region cache

Feifei Wang feifei.wang2 at arm.com
Tue May 18 10:50:58 CEST 2021


'dev_gen' is a variable to trigger all cores to flush their local caches
once the global MR cache has been rebuilt.

This is due to MR cache's R/W lock can maintain synchronization between
threads:

1. dev_gen and global cache updating ordering inside the lock protected
section does not matter. Because other threads cannot take the lock
until global cache has been updated. Thus, in out of order platform,
even if other agents firstly observe updated dev_gen but global does
not update, they also have to wait the lock. As a result, it is
unnecessary to add a wmb between global cache rebuilding and updating
the dev_gen to keep the memory store order.

2. Store-Release of unlock provides the implicit wmb at the level
visible by software. This makes 'rebuilding global cache' and 'updating
dev_gen' be observed before local_cache starts to be updated by other
agents. Thus, wmb after 'updating dev_gen' can be removed.

Suggested-by: Ruifeng Wang <ruifeng.wang at arm.com>
Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
---
 drivers/net/mlx5/mlx5_mr.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e791b6338d..0c5403e493 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -107,18 +107,13 @@ mlx5_mr_mem_event_free_cb(struct mlx5_dev_ctx_shared *sh,
 	if (rebuild) {
 		mlx5_mr_rebuild_cache(&sh->share_cache);
 		/*
-		 * Flush local caches by propagating invalidation across cores.
-		 * rte_smp_wmb() is enough to synchronize this event. If one of
-		 * freed memsegs is seen by other core, that means the memseg
-		 * has been allocated by allocator, which will come after this
-		 * free call. Therefore, this store instruction (incrementing
-		 * generation below) will be guaranteed to be seen by other core
-		 * before the core sees the newly allocated memory.
+		 * No explicit wmb is needed after updating dev_gen due to
+		 * store-release ordering in unlock that provides the
+		 * implicit barrier at the software visible level.
 		 */
 		++sh->share_cache.dev_gen;
 		DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
 		      sh->share_cache.dev_gen);
-		rte_smp_wmb();
 	}
 	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
 }
@@ -411,18 +406,13 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	      (void *)mr);
 	mlx5_mr_rebuild_cache(&sh->share_cache);
 	/*
-	 * Flush local caches by propagating invalidation across cores.
-	 * rte_smp_wmb() is enough to synchronize this event. If one of
-	 * freed memsegs is seen by other core, that means the memseg
-	 * has been allocated by allocator, which will come after this
-	 * free call. Therefore, this store instruction (incrementing
-	 * generation below) will be guaranteed to be seen by other core
-	 * before the core sees the newly allocated memory.
+	 * No explicit wmb is needed after updating dev_gen due to
+	 * store-release ordering in unlock that provides the
+	 * implicit barrier at the software visible level.
 	 */
 	++sh->share_cache.dev_gen;
 	DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
 	      sh->share_cache.dev_gen);
-	rte_smp_wmb();
 	rte_rwlock_read_unlock(&sh->share_cache.rwlock);
 	return 0;
 }
-- 
2.25.1



More information about the dev mailing list