[dpdk-dev] [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory Region cache

Slava Ovsiienko viacheslavo at nvidia.com
Mon May 17 16:15:22 CEST 2021


Hi, Feifei

Thanks you for the patch.
Please, see my notes below about typos and minor commit message rewording.

> -----Original Message-----
> From: Feifei Wang <feifei.wang2 at arm.com>
> Sent: Monday, May 17, 2021 13:00
> To: Matan Azrad <matan at nvidia.com>; Shahaf Shuler
> <shahafs at nvidia.com>; Slava Ovsiienko <viacheslavo at nvidia.com>
> Cc: dev at dpdk.org; nd at arm.com; Feifei Wang <feifei.wang2 at arm.com>;
> Ruifeng Wang <ruifeng.wang at arm.com>
> Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> Region cache
> 
> 'dev_gen' is a variable to inform other cores to flush their local cache when
> global cache is rebuilt. It is unnecessary to add write memory barrier (wmb)
> before or after its updating for synchronization.
> 
Would it be better "to trigger all cores to flush their local caches once the global
MR cache has been rebuilt"  ? 

> This is due to MR cache's R/W lock can maintain synchronization between
> threads:
I would add empty line here.
> 1. dev_gen and global cache update ordering inside the lock protected
> section does not matter. Because other threads cannot take the lock until
> global cache has been updated. Thus, in out of order platform, even if other
> agents firstly observed updated dev_gen but global does not update, they
> also needs to wait the lock. As a result, it is unnecessary to add a wmb
Type: "need" (no S) -> "have to" would be better ? 

> between rebuiling global cache and updating dev_gen to keep the order

rebuiling -> rebuilDing
And let's reword a little bit?
"wmb between global cache rebuilding and updating the dev_gen to keep the memory store order."

> rebuilding global cache and updating dev_gen.
> 
> 2. Store-Release of unlock can provide the implicit wmb at the level visible by
can provide -> provides

> software. This makes 'rebuiling global cache' and 'updating dev_gen' be
Typo: rebuiling -> rebuilDing


> observed before local_cache starts to be updated by other agents. Thus,
> wmb after 'updating dev_gen' can be removed.
> 
> Suggested-by: Ruifeng Wang <ruifeng.wang at arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> ---
>  drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index
> e791b6338d..85e5865050 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct
> mlx5_dev_ctx_shared *sh,
>  	if (rebuild) {
>  		mlx5_mr_rebuild_cache(&sh->share_cache);
>  		/*
> -		 * Flush local caches by propagating invalidation across cores.
> -		 * rte_smp_wmb() is enough to synchronize this event. If
> one of
> -		 * freed memsegs is seen by other core, that means the
> memseg
> -		 * has been allocated by allocator, which will come after this
> -		 * free call. Therefore, this store instruction (incrementing
> -		 * generation below) will be guaranteed to be seen by other
> core
> -		 * before the core sees the newly allocated memory.
> +		 * No wmb is needed after updating dev_gen due to store-
> release of
> +		 * unlock can provide the implicit wmb at the level visible by
> +		 * software. This makes rebuilt global cache and updated
> dev_gen
> +		 * be observed when local_cache starts to be updating by
> other
> +		 * agents.
>  		 */
Let's make comment a less wordy (and try to keep source code concise), what about this?
"No explicit wmb is needed after updating dev_gen due to store-release ordering
in unlock that provides the implicit barrier at the software visible level."

>  		++sh->share_cache.dev_gen;
>  		DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
>  		      sh->share_cache.dev_gen);
> -		rte_smp_wmb();
>  	}
>  	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
>  }
> @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void *addr,
>  	      (void *)mr);
>  	mlx5_mr_rebuild_cache(&sh->share_cache);
>  	/*
> -	 * Flush local caches by propagating invalidation across cores.
> -	 * rte_smp_wmb() is enough to synchronize this event. If one of
> -	 * freed memsegs is seen by other core, that means the memseg
> -	 * has been allocated by allocator, which will come after this
> -	 * free call. Therefore, this store instruction (incrementing
> -	 * generation below) will be guaranteed to be seen by other core
> -	 * before the core sees the newly allocated memory.
> +	 * No wmb is needed after updating dev_gen due to store-release of
> +	 * unlock can provide the implicit wmb at the level visible by
> +	 * software. This makes rebuilt global cache and updated dev_gen
> +	 * be observed when local_cache starts to be updating by other
> +	 * agents.
The same as previous comment above.

Please, apply the same comments to the mlx4 patch:
http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-feifei.wang2@arm.com/

With best regards,
Slava



More information about the dev mailing list