[dpdk-dev] 回复: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory Region cache
Feifei Wang
Feifei.Wang2 at arm.com
Tue May 18 10:52:36 CEST 2021
Hi, Slava
> -----邮件原件-----
> 发件人: Slava Ovsiienko <viacheslavo at nvidia.com>
> 发送时间: 2021年5月17日 22:15
> 收件人: Feifei Wang <Feifei.Wang2 at arm.com>; Matan Azrad
> <matan at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> 抄送: dev at dpdk.org; nd <nd at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>
> 主题: RE: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> Region cache
>
> Hi, Feifei
>
> Thanks you for the patch.
> Please, see my notes below about typos and minor commit message
> rewording.
Thanks very much for your very careful reviewing.
I will apply these in the next version.
>
> > -----Original Message-----
> > From: Feifei Wang <feifei.wang2 at arm.com>
> > Sent: Monday, May 17, 2021 13:00
> > To: Matan Azrad <matan at nvidia.com>; Shahaf Shuler
> > <shahafs at nvidia.com>; Slava Ovsiienko <viacheslavo at nvidia.com>
> > Cc: dev at dpdk.org; nd at arm.com; Feifei Wang <feifei.wang2 at arm.com>;
> > Ruifeng Wang <ruifeng.wang at arm.com>
> > Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> > Region cache
> >
> > 'dev_gen' is a variable to inform other cores to flush their local
> > cache when global cache is rebuilt. It is unnecessary to add write
> > memory barrier (wmb) before or after its updating for synchronization.
> >
> Would it be better "to trigger all cores to flush their local caches once the
> global MR cache has been rebuilt" ?
>
1.Yes, I think this can be more clear.
> > This is due to MR cache's R/W lock can maintain synchronization
> > between
> > threads:
> I would add empty line here.
2.Done.
> > 1. dev_gen and global cache update ordering inside the lock protected
> > section does not matter. Because other threads cannot take the lock
> > until global cache has been updated. Thus, in out of order platform,
> > even if other agents firstly observed updated dev_gen but global does
> > not update, they also needs to wait the lock. As a result, it is
> > unnecessary to add a wmb
> Type: "need" (no S) -> "have to" would be better ?
>
3.Done.
> > between rebuiling global cache and updating dev_gen to keep the order
>
> rebuiling -> rebuilding
4.Done.
> And let's reword a little bit?
> "wmb between global cache rebuilding and updating the dev_gen to keep
> the memory store order."
>
5.Done.
> > rebuilding global cache and updating dev_gen.
> >
> > 2. Store-Release of unlock can provide the implicit wmb at the level
> > visible by
> can provide -> provides
>
6.Done.
> > software. This makes 'rebuiling global cache' and 'updating dev_gen'
> > be
> Typo: rebuiling -> rebuilding
7.Done.
>
>
> > observed before local_cache starts to be updated by other agents.
> > Thus, wmb after 'updating dev_gen' can be removed.
> >
> > Suggested-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > ---
> > drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
> > 1 file changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> > index
> > e791b6338d..85e5865050 100644
> > --- a/drivers/net/mlx5/mlx5_mr.c
> > +++ b/drivers/net/mlx5/mlx5_mr.c
> > @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct
> > mlx5_dev_ctx_shared *sh,
> > if (rebuild) {
> > mlx5_mr_rebuild_cache(&sh->share_cache);
> > /*
> > - * Flush local caches by propagating invalidation across cores.
> > - * rte_smp_wmb() is enough to synchronize this event. If
> > one of
> > - * freed memsegs is seen by other core, that means the
> > memseg
> > - * has been allocated by allocator, which will come after this
> > - * free call. Therefore, this store instruction (incrementing
> > - * generation below) will be guaranteed to be seen by other
> > core
> > - * before the core sees the newly allocated memory.
> > + * No wmb is needed after updating dev_gen due to store-
> > release of
> > + * unlock can provide the implicit wmb at the level visible by
> > + * software. This makes rebuilt global cache and updated
> > dev_gen
> > + * be observed when local_cache starts to be updating by
> > other
> > + * agents.
> > */
> Let's make comment a less wordy (and try to keep source code concise),
> what about this?
> "No explicit wmb is needed after updating dev_gen due to store-release
> ordering in unlock that provides the implicit barrier at the software visible
> level."
8.That's better than before. A concise comment works better in the code.
>
> > ++sh->share_cache.dev_gen;
> > DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
> > sh->share_cache.dev_gen);
> > - rte_smp_wmb();
> > }
> > rte_rwlock_write_unlock(&sh->share_cache.rwlock);
> > }
> > @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void
> > *addr,
> > (void *)mr);
> > mlx5_mr_rebuild_cache(&sh->share_cache);
> > /*
> > - * Flush local caches by propagating invalidation across cores.
> > - * rte_smp_wmb() is enough to synchronize this event. If one of
> > - * freed memsegs is seen by other core, that means the memseg
> > - * has been allocated by allocator, which will come after this
> > - * free call. Therefore, this store instruction (incrementing
> > - * generation below) will be guaranteed to be seen by other core
> > - * before the core sees the newly allocated memory.
> > + * No wmb is needed after updating dev_gen due to store-release of
> > + * unlock can provide the implicit wmb at the level visible by
> > + * software. This makes rebuilt global cache and updated dev_gen
> > + * be observed when local_cache starts to be updating by other
> > + * agents.
> The same as previous comment above.
9.Done.
>
> Please, apply the same comments to the mlx4 patch:
> http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-
> feifei.wang2 at arm.com/
>
10.Done.
Best Regards
Feifei
> With best regards,
> Slava
More information about the dev
mailing list