[dpdk-dev] 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache

Feifei Wang Feifei.Wang2 at arm.com
Fri May 14 07:18:51 CEST 2021


Hi, Slava

> -----邮件原件-----
> 发件人: Slava Ovsiienko <viacheslavo at nvidia.com>
> 发送时间: 2021年5月13日 18:49
> 收件人: Feifei Wang <Feifei.Wang2 at arm.com>; Matan Azrad
> <matan at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> 抄送: dev at dpdk.org; nd <nd at arm.com>; stable at dpdk.org; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; nd <nd at arm.com>; nd <nd at arm.com>
> 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
> 
> Hi, Feifei
> 
> .. snip..
> >
> > I can understand you worry that if there is no 'wmb at last', when
> > agent_1 leave the locked section, agent_2 still may observe unchanged
> global cache.
> >
> > However, when agent_2 take a lock and get(new MR) in time slot
> > 7(Fig.1), it means
> > agent_1 has finished updating global cache and lock is freed.
> > Besides, if agent_2 can take a lock, it also shows agent_2 has
> > observed the changed global cache.
> >
> > This is because there is a store- release in rte_rwlock_read_unlock,
> > store- release ensures all store operation before 'release' can be
> > committed  if store operation after 'release' is observed by other agents.
> 
> OK, I missed this implicit ordering hidden in the lock/unlock(), thank you for
> pointing me out that.
> 
> I checked the listings for rte_rwlock_write_unlock(), it is implemented with
> __atomic_store_n(&rwl->cnt, 0, __ATOMIC_RELEASE) and:
> - on x86 there is nothing special in the code due to strict x86-64 ordering
> model
> - on ARMv8 there is stlxr instruction, that implies the write barrier
> 
> Now you convinced me 😊, we can get rid of the explicit "wmb_at_last" in
> the code.
> Please, provide the patch with clear commit message with details above, it is
> important to mention:
> - lock protection is involved, dev_gen and cache update ordering inside the
> protected section does not matter
> - unlock() provides implicit write barrier due to atomic operation
> _ATOMIC_RELEASE

Ok, I will be careful with the commit message and include above details in
it.

> 
> Also, in my opinion, there should be small comment in place of wmb being
> revomed, reminding we are in the protected section and unlock provides the
> implicit write barrier at the level visible by software.
> 

Yes, it is necessary to add small comment in the place of wmb to explain
why we do this.

> Thank you for the meaningful discussion, I refreshed my knowledge of
> memory ordering models for different architectures 😊
> 

Also thanks very much for this discussion, I think I learn a lot from it and
enjoy exploring the unknown problems.
Looking forward to your review about the next version, thanks a lot 😊.

Best Regards
Feifei

> With best regards,
> Slava
> 
> > >
> > > >
> > > > Best Regards
> > > > Feifei
> > > >
> > > > > With best regards,
> > > > > Slava
> > > > >
> > > > >
> > > > > >
> > > > > > Best Regards
> > > > > > Feifei
> > > > > >
> > > > > > > -----邮件原件-----
> > > > > > > 发件人: Slava Ovsiienko <viacheslavo at nvidia.com>
> > > > > > > 发送时间: 2021年5月7日 18:15
> > > > > > > 收件人: Feifei Wang <Feifei.Wang2 at arm.com>; Matan Azrad
> > > > > > > <matan at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>; stable at dpdk.org;
> > Ruifeng
> > > > > Wang
> > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory
> > > > > > > Region cache
> > > > > > >
> > > > > > > Hi, Feifei
> > > > > > >
> > > > > > > We should consider the locks in your scenario - it is
> > > > > > > crucial for the complete model description:
> > > > > > >
> > > > > > > How agent_1 (in your terms) rebuilds global cache:
> > > > > > >
> > > > > > > 1a) lock()
> > > > > > > 1b) rebuild(global cache)
> > > > > > > 1c) update(dev_gen)
> > > > > > > 1d) wmb()
> > > > > > > 1e) unlock()
> > > > > > >
> > > > > > > How agent_2 checks:
> > > > > > >
> > > > > > > 2a) check(dev_gen) (assume positive - changed)
> > > > > > > 2b) clear(local_cache)
> > > > > > > 2c) miss(on empty local_cache) - > eventually it goes to
> > > > > > > mr_lookup_caches()
> > > > > > > 2d) lock()
> > > > > > > 2e) get(new MR)
> > > > > > > 2f) unlock()
> > > > > > > 2g) update(local cache with obtained new MR)
> > > > > > >
> > > > > > > Hence, even if 1c) becomes visible in 2a) before 1b)
> > > > > > > committed (say, due to out-of-order Arch) - the agent 2
> > > > > > > would be blocked on
> > > > > > > 2d) and scenario depicted on your Fig2 would not happen
> > > > > > > (agent_2 will wait before step 3 till agent 1 unlocks after its step 5).
> > > > > > >
> > > > > > > With best regards,
> > > > > > > Slava
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Feifei Wang <Feifei.Wang2 at arm.com>
> > > > > > > > Sent: Friday, May 7, 2021 9:36> To: Slava Ovsiienko
> > > > > > > > <viacheslavo at nvidia.com>; Matan Azrad <matan at nvidia.com>;
> > > > Shahaf
> > > > > > > > Shuler <shahafs at nvidia.com>
> > > > > > > > Cc: dev at dpdk.org; nd <nd at arm.com>; stable at dpdk.org;
> > > > > > > > Ruifeng
> > > > > Wang
> > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > Memory Region cache
> > > > > > > >
> > > > > > > > Hi, Slava
> > > > > > > >
> > > > > > > > Thanks very much for your reply.
> > > > > > > >
> > > > > > > > > -----邮件原件-----
> > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo at nvidia.com>
> > > > > > > > > 发送时间: 2021年5月6日 19:22
> > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2 at arm.com>; Matan Azrad
> > > > > > > > > <matan at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> > > > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>; stable at dpdk.org;
> > > > Ruifeng
> > > > > > > Wang
> > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > Memory Region cache
> > > > > > > > >
> > > > > > > > > Hi, Feifei
> > > > > > > > >
> > > > > > > > > Sorry, I do not follow why we should get rid of the last
> > > > > > > > > (after dev_gen update) wmb.
> > > > > > > > > We've rebuilt the global cache, we should notify other
> > > > > > > > > agents it's happened and they should flush local caches.
> > > > > > > > > So, dev_gen change should be made visible to other
> > > > > > > > > agents to trigger this activity and the second wmb is here to
> ensure this.
> > > > > > > >
> > > > > > > > 1. For the first problem why we should get rid of the last
> > > > > > > > wmb and move it before dev_gen updated, I think our
> > > > > > > > attention is how the wmb implements the synchronization
> > > > > > > > between multiple
> > > agents.
> > > > > > > > 					Fig1
> > > > > > > > ----------------------------------------------------------
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > ------------------------
> > > > > > > > -------
> > > > > > > > Timeslot        		agent_1
> agent_2
> > > > > > > > 1          		rebuild global cache
> > > > > > > > 2                       		wmb
> > > > > > > > 3            		     update dev_gen ----------------------- load
> > > changed
> > > > > > > > dev_gen
> > > > > > > > 4                                  			        	           rebuild local
> > > > cache
> > > > > > > > ----------------------------------------------------------
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > ------------------------
> > > > > > > > -------
> > > > > > > >
> > > > > > > > First, wmb is only for local thread to keep the order
> > > > > > > > between local
> > > > > > > > write- write :
> > > > > > > > Based on the picture above, for agent_1, wmb keeps the
> > > > > > > > order that rebuilding global cache is always before updating
> dev_gen.
> > > > > > > >
> > > > > > > > Second, agent_1 communicates with agent_2 by the global
> > > > > > > > variable "dev_gen" :
> > > > > > > > If agent_1 updates dev_gen, agent_2 will load it and then
> > > > > > > > it knows it should rebuild local cache
> > > > > > > >
> > > > > > > > Finally, agent_2 rebuilds local cache according to whether
> > > > > > > > agent_1 has rebuilt global cache, and agent_2 knows this
> > > > > > > > information by the variable
> > > > > > > "dev_gen".
> > > > > > > > 					Fig2
> > > > > > > > ----------------------------------------------------------
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > ------------------------
> > > > > > > > -------
> > > > > > > > Timeslot        		agent_1
> agent_2
> > > > > > > > 1		        update dev_gen
> > > > > > > > 2						      load changed
> > > > dev_gen
> > > > > > > > 3						          rebuild local
> > > > cache
> > > > > > > > 4        		    rebuild global cache
> > > > > > > > 5			 wmb
> > > > > > > > ----------------------------------------------------------
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > --
> > > > > > > > ------------------------
> > > > > > > > -------
> > > > > > > >
> > > > > > > > However, in arm platform, if wmb is after dev_gen updated,
> > > > "dev_gen"
> > > > > > > > may be updated before agent_1 rebuilding global cache,
> > > > > > > > then
> > > > > > > > agent_2 maybe receive error message and rebuild its local
> > > > > > > > cache in
> > > > > advance.
> > > > > > > >
> > > > > > > > To summarize, it is not important which time other agents
> > > > > > > > can see the changed global variable "dev_gen".
> > > > > > > > (Actually, wmb after "dev_gen" cannot ensure changed
> > "dev_gen"
> > > > > > > > is committed to the global).
> > > > > > > > It is more important that if other agents see the changed
> > > > > > > > "dev_gen", they also can know global cache has been updated.
> > > > > > > >
> > > > > > > > > One more point, due to registering new/destroying
> > > > > > > > > existing MR involves FW (via kernel) calls, it takes so
> > > > > > > > > many CPU cycles that we could neglect wmb overhead at all.
> > > > > > > >
> > > > > > > > We just move the last wmb into the right place, and not
> > > > > > > > delete it for performance.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Also, regarding this:
> > > > > > > > >
> > > > > > > > >  > > Another question suddenly occurred to me, in order
> > > > > > > > > to keep the
> > > > > > > > > >
> > > > > > > > > > order that rebuilding global cache before updating
> > > > > > > > > > ”dev_gen“, the
> > > > > > > > > > > wmb should be before updating "dev_gen" rather than
> > > > > > > > > > > after
> > it.
> > > > > > > > >  > > Otherwise, in the out-of-order platforms, current
> > > > > > > > > order cannot be
> > > > > > > > kept.
> > > > > > > > >
> > > > > > > > > it is not clear why ordering is important - global cache
> > > > > > > > > update and dev_gen change happen under spinlock
> > > > > > > > > protection, so only the last wmb is meaningful.
> > > > > > > > >
> > > > > > > >
> > > > > > > > 2. The second function of wmb before "dev_gen" updated is
> > > > > > > > for performance according to our previous discussion.
> > > > > > > > According to Fig2, if there is no wmb between "global
> > > > > > > > cache
> > > updated"
> > > > > > > > and "dev_gen updated", "dev_gen" may update before global
> > > > > > > > cache
> > > > > > > updated.
> > > > > > > >
> > > > > > > > Then agent_2 may see the changed "dev_gen" and flush
> > > > > > > > entire local cache in advance.
> > > > > > > >
> > > > > > > > This entire flush can degrade the performance:
> > > > > > > > "the local cache is getting empty and can't provide
> > > > > > > > translation for other valid (not being removed) MRs, and
> > > > > > > > the translation has to look up in the global cache, that
> > > > > > > > is locked now for rebuilding, this causes the delays in
> > > > > > > > data path on acquiring global
> > > > cache lock."
> > > > > > > >
> > > > > > > > Furthermore, spinlock is just for global cache, not for
> > > > > > > > dev_gen and local cache.
> > > > > > > >
> > > > > > > > > To summarize, in my opinion:
> > > > > > > > > - if you see some issue with ordering of global cache
> > > > > > > > > update/dev_gen signalling,
> > > > > > > > >   could you, please, elaborate? I'm not sure we should
> > > > > > > > > maintain an order (due to spinlock protection)
> > > > > > > > > - the last rte_smp_wmb() after dev_gen incrementing
> > > > > > > > > should be kept intact
> > > > > > > > >
> > > > > > > >
> > > > > > > > At last, for my view, there are two functions that moving
> > > > > > > > wmb before "dev_gen"
> > > > > > > > for the write-write order:
> > > > > > > > --------------------------------
> > > > > > > > a) rebuild global cache;
> > > > > > > > b) rte_smp_wmb();
> > > > > > > > c) updating dev_gen
> > > > > > > > -------------------------------- 1. Achieve
> > > > > > > > synchronization between multiple threads in the right way 2.
> > > > > > > > Prevent other agents from flushing local cache early to
> > > > > > > > ensure performance
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Feifei
> > > > > > > >
> > > > > > > > > With best regards,
> > > > > > > > > Slava
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Feifei Wang <Feifei.Wang2 at arm.com>
> > > > > > > > > > Sent: Thursday, May 6, 2021 5:52
> > > > > > > > > > To: Slava Ovsiienko <viacheslavo at nvidia.com>; Matan
> > > > > > > > > > Azrad <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > Cc: dev at dpdk.org; nd <nd at arm.com>; stable at dpdk.org;
> > > > > > > > > > Ruifeng
> > > > > > > Wang
> > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > for Memory Region cache
> > > > > > > > > >
> > > > > > > > > > Hi, Slava
> > > > > > > > > >
> > > > > > > > > > Would you have more comments about this patch?
> > > > > > > > > > For my sight, only one wmb before "dev_gen" updating
> > > > > > > > > > is enough to synchronize.
> > > > > > > > > >
> > > > > > > > > > Thanks very much for your attention.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > > Feifei
> > > > > > > > > >
> > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > 发件人: Feifei Wang
> > > > > > > > > > > 发送时间: 2021年4月20日 16:42
> > > > > > > > > > > 收件人: Slava Ovsiienko <viacheslavo at nvidia.com>;
> Matan
> > > > Azrad
> > > > > > > > > > > <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>;
> stable at dpdk.org;
> > > > > > Ruifeng
> > > > > > > > > Wang
> > > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > > > Memory
> > > > > > > > Region
> > > > > > > > > > > cache
> > > > > > > > > > >
> > > > > > > > > > > Hi, Slava
> > > > > > > > > > >
> > > > > > > > > > > I think the second wmb can be removed.
> > > > > > > > > > > As I know, wmb is just a barrier to keep the order
> > > > > > > > > > > between write and
> > > > > > > > > write.
> > > > > > > > > > > and it cannot tell the CPU when it should commit the
> > changes.
> > > > > > > > > > >
> > > > > > > > > > > It is usually used before guard variable to keep the
> > > > > > > > > > > order that updating guard variable after some
> > > > > > > > > > > changes, which you want to release,
> > > > > > > > > > have been done.
> > > > > > > > > > >
> > > > > > > > > > > For example, for the wmb  after global cache
> > > > > > > > > > > update/before altering dev_gen, it can ensure the
> > > > > > > > > > > order that updating global cache before altering
> > > > > > > > > > > dev_gen:
> > > > > > > > > > > 1)If other agent load the changed "dev_gen", it can
> > > > > > > > > > > know the global cache has been updated.
> > > > > > > > > > > 2)If other agents load the unchanged, "dev_gen", it
> > > > > > > > > > > means the global cache has not been updated, and the
> > > > > > > > > > > local cache will not be
> > > > > > > > flushed.
> > > > > > > > > > >
> > > > > > > > > > > As a result, we use  wmb and guard variable "dev_gen"
> > > > > > > > > > > to ensure the global cache updating is "visible".
> > > > > > > > > > > The "visible" means when updating guard variable
> > "dev_gen"
> > > > > > > > > > > is known by other agents, they also can confirm
> > > > > > > > > > > global cache has been updated in the meanwhile.
> > > > > > > > > > > Thus, just one wmb before altering dev_gen can
> > > > > > > > > > > ensure
> > > > > > > > > > this.
> > > > > > > > > > >
> > > > > > > > > > > Best Regards
> > > > > > > > > > > Feifei
> > > > > > > > > > >
> > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo at nvidia.com>
> > > > > > > > > > > > 发送时间: 2021年4月20日 15:54
> > > > > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2 at arm.com>; Matan
> > > Azrad
> > > > > > > > > > > > <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>;
> > stable at dpdk.org;
> > > > > > > Ruifeng
> > > > > > > > > > Wang
> > > > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>; nd
> > > > > > > <nd at arm.com>;
> > > > > > > > > nd
> > > > > > > > > > > > <nd at arm.com>
> > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > > > for Memory Region cache
> > > > > > > > > > > >
> > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > >
> > > > > > > > > > > > In my opinion, there should be 2 barriers:
> > > > > > > > > > > >  - after global cache update/before altering
> > > > > > > > > > > > dev_gen, to ensure the correct order
> > > > > > > > > > > >  - after altering dev_gen to make this change
> > > > > > > > > > > > visible for other agents and to trigger local
> > > > > > > > > > > > cache update
> > > > > > > > > > > >
> > > > > > > > > > > > With best regards, Slava
> > > > > > > > > > > >
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: Feifei Wang <Feifei.Wang2 at arm.com>
> > > > > > > > > > > > > Sent: Tuesday, April 20, 2021 10:30
> > > > > > > > > > > > > To: Slava Ovsiienko <viacheslavo at nvidia.com>;
> > > > > > > > > > > > > Matan Azrad <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > > > > Cc: dev at dpdk.org; nd <nd at arm.com>;
> > > > > > > > > > > > > stable at dpdk.org; Ruifeng
> > > > > > > > > > Wang
> > > > > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>; nd
> > > > > > > > <nd at arm.com>;
> > > > > > > > > > nd
> > > > > > > > > > > > > <nd at arm.com>
> > > > > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix
> > > > > > > > > > > > > rebuild bug for Memory Region cache
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Slava
> > > > > > > > > > > > >
> > > > > > > > > > > > > Another question suddenly occurred to me, in
> > > > > > > > > > > > > order to keep the order that rebuilding global
> > > > > > > > > > > > > cache before updating ”dev_gen“, the wmb should
> > > > > > > > > > > > > be before updating
> > > > > "dev_gen"
> > > > > > > > > > > > > rather
> > > > > > > than after it.
> > > > > > > > > > > > > Otherwise, in the out-of-order platforms,
> > > > > > > > > > > > > current order cannot be
> > > > > > > > > kept.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thus, we should change the code as:
> > > > > > > > > > > > > a) rebuild global cache;
> > > > > > > > > > > > > b) rte_smp_wmb();
> > > > > > > > > > > > > c) updating dev_gen
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > Feifei
> > > > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > > > 发件人: Feifei Wang
> > > > > > > > > > > > > > 发送时间: 2021年4月20日 13:54
> > > > > > > > > > > > > > 收件人: Slava Ovsiienko <viacheslavo at nvidia.com>;
> > > > Matan
> > > > > > > Azrad
> > > > > > > > > > > > > > <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>;
> > > > stable at dpdk.org;
> > > > > > > > > Ruifeng
> > > > > > > > > > > > Wang
> > > > > > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>; nd
> > > > > > > > <nd at arm.com>
> > > > > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild
> > > > > > > > > > > > > > bug for Memory
> > > > > > > > > > > Region
> > > > > > > > > > > > > > cache
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Slava
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks very much for your explanation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can understand the app can wait all mbufs
> > > > > > > > > > > > > > are returned to the memory pool, and then it
> > > > > > > > > > > > > > can free this mbufs, I agree with
> > > > > > > > this.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As a result, I will remove the bug fix patch
> > > > > > > > > > > > > > from this series and just replace the smp
> > > > > > > > > > > > > > barrier with
> > > > > > > > > > > > > > C11 thread fence. Thanks very much for your
> > > > > > > > > > > > > > patient explanation
> > > > > again.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > Feifei
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > > > > 发件人: Slava Ovsiienko
> > > > > > > > > > > > > > > <viacheslavo at nvidia.com>
> > > > > > > > > > > > > > > 发送时间: 2021年4月20日 2:51
> > > > > > > > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2 at arm.com>;
> > Matan
> > > > > > Azrad
> > > > > > > > > > > > > > > <matan at nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > > > > <shahafs at nvidia.com>
> > > > > > > > > > > > > > > 抄送: dev at dpdk.org; nd <nd at arm.com>;
> > > > > stable at dpdk.org;
> > > > > > > > > > Ruifeng
> > > > > > > > > > > > > Wang
> > > > > > > > > > > > > > > <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild
> > > > > > > > > > > > > > > bug for Memory Region cache
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please, see below
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ....
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Sorry, I do not follow what this patch fixes.
> > > > > > > > > > > > > > > > > Do we have some issue/bug with MR cache
> > > > > > > > > > > > > > > > > in
> > > > practice?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This patch fixes the bug which is based on
> > > > > > > > > > > > > > > > logical deduction, and it doesn't actually happen.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Each Tx queue has its own dedicated "local"
> > > > > > > > > > > > > > > > > cache for MRs to convert buffer address
> > > > > > > > > > > > > > > > > in mbufs being transmitted to LKeys
> > > > > > > > > > > > > > > > > (HW-related entity
> > > > > > > > > > > > > > > > > handle) and the "global" cache for all
> > > > > > > > > > > > > > > > > MR registered on the
> > > > > > > > > > device.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > AFAIK, how conversion happens in datapath:
> > > > > > > > > > > > > > > > > - check the local queue cache flush
> > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > - lookup in local cache
> > > > > > > > > > > > > > > > > - if not found:
> > > > > > > > > > > > > > > > > - acquire lock for global cache read
> > > > > > > > > > > > > > > > > access
> > > > > > > > > > > > > > > > > - lookup in global cache
> > > > > > > > > > > > > > > > > - release lock for global cache
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > How cache update on memory
> > > > > > > > > > > > > > > > > freeing/unregistering
> > > > > > > > happens:
> > > > > > > > > > > > > > > > > - acquire lock for global cache write
> > > > > > > > > > > > > > > > > access
> > > > > > > > > > > > > > > > > - [a] remove relevant MRs from the
> > > > > > > > > > > > > > > > > global cache
> > > > > > > > > > > > > > > > > - [b] set local caches flush request
> > > > > > > > > > > > > > > > > - free global cache lock
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If I understand correctly, your patch
> > > > > > > > > > > > > > > > > swaps [a] and [b], and local caches
> > > > > > > > > > > > > > > > > flush is requested
> > > earlier.
> > > > > > > > > > > > > > > > > What problem does it
> > > > > > > > > > solve?
> > > > > > > > > > > > > > > > > It is not supposed there are in datapath
> > > > > > > > > > > > > > > > > some mbufs referencing to the memory
> > > > > > > > > > > > > > > > > being
> > > freed.
> > > > > > > > > > > > > > > > > Application must ensure this and must
> > > > > > > > > > > > > > > > > not allocate new mbufs from this memory
> > > > > > > > > > > > > > > > > regions
> > > > > > > > > > > > being freed.
> > > > > > > > > > > > > > > > > Hence, the lookups for these MRs in
> > > > > > > > > > > > > > > > > caches should not
> > > > > > > > occur.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > For your first point that, application can
> > > > > > > > > > > > > > > > take charge of preventing MR freed memory
> > > > > > > > > > > > > > > > being allocated to
> > > > > > data path.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Does it means that If there is an
> > > > > > > > > > > > > > > > emergency of MR fragment, such as hotplug,
> > > > > > > > > > > > > > > > the application must inform thedata path
> > > > > > > > > > > > > > > > in advance, and this memory will not be
> > > > > > > > > > > > > > > > allocated, and then the control path will
> > > > > > > > > > > > > > > > free this memory? If application  can do
> > > > > > > > > > > > > > > > like this, I agree that this bug
> > > > > > > > > > > > > > cannot happen.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Actually,  this is the only correct way for
> > > > > > > > > > > > > > > application to
> > > > > > operate.
> > > > > > > > > > > > > > > Let's suppose we have some memory area that
> > > > > > > > > > > > > > > application wants to
> > > > > > > > > > > > free.
> > > > > > > > > > > > > > > ALL references to this area must be removed.
> > > > > > > > > > > > > > > If we have some mbufs allocated from this
> > > > > > > > > > > > > > > area, it means that we have memory pool
> > > > > > > > > > > > > > > created
> > > > > > > > > > > > > there.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What application should do:
> > > > > > > > > > > > > > > - notify all its components/agents the
> > > > > > > > > > > > > > > memory area is going to be freed
> > > > > > > > > > > > > > > - all components/agents free the mbufs they
> > > > > > > > > > > > > > > might own
> > > > > > > > > > > > > > > - PMD might not support freeing for some
> > > > > > > > > > > > > > > mbufs (for example being sent and awaiting
> > > > > > > > > > > > > > > for completion), so app should just wait
> > > > > > > > > > > > > > > - wait till all mbufs are returned to the
> > > > > > > > > > > > > > > memory pool (by monitoring available obj ==
> > > > > > > > > > > > > > > pool size)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Otherwise - it is dangerous to free the memory.
> > > > > > > > > > > > > > > There are just some mbufs still allocated,
> > > > > > > > > > > > > > > it is regardless to buf address to MR
> > > > > > > > > > > > > > > translation. We just can't free the memory -
> > > > > > > > > > > > > > > the mapping will be destroyed and might
> > > > > > > > > > > > > > > cause the segmentation fault by SW or some
> > > > > > > > > > > > > > > HW issues on DMA access to unmapped memory.
> > > > > > > > > > > > > > > It is very generic safety approach - do not
> > > > > > > > > > > > > > > free the memory that is still in
> > > > > > > > > > use.
> > > > > > > > > > > > > > > Hence, at the moment of freeing and
> > > > > > > > > > > > > > > unregistering the MR, there MUST BE NO any
> > > > > > > > > > > > > > mbufs in flight referencing to the addresses
> > > > > > > > > > > > > > being
> > freed.
> > > > > > > > > > > > > > > No translation to MR being invalidated can happen.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > For other side, the cache flush has
> > > > > > > > > > > > > > > > > negative effect
> > > > > > > > > > > > > > > > > - the local cache is getting empty and
> > > > > > > > > > > > > > > > > can't provide translation for other
> > > > > > > > > > > > > > > > > valid (not being
> > > > > > > > > > > > > > > > > removed) MRs, and the translation has to
> > > > > > > > > > > > > > > > > look up in the global cache, that is
> > > > > > > > > > > > > > > > > locked now for rebuilding, this causes
> > > > > > > > > > > > > > > > > the delays in datapatch
> > > > > > > > > > > > > > > > on acquiring global cache lock.
> > > > > > > > > > > > > > > > > So, I see some potential performance impact.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If above assumption is true, we can go to
> > > > > > > > > > > > > > > > your second
> > > > > > point.
> > > > > > > > > > > > > > > > I think this is a problem of the tradeoff
> > > > > > > > > > > > > > > > between cache coherence and
> > > > > > > > > > > > > > > performance.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can understand your meaning that though
> > > > > > > > > > > > > > > > global cache has been changed, we should
> > > > > > > > > > > > > > > > keep the valid MR in local cache as long
> > > > > > > > > > > > > > > > as possible to ensure the fast
> > > > > > searching speed.
> > > > > > > > > > > > > > > > In the meanwhile, the local cache can be
> > > > > > > > > > > > > > > > rebuilt later to reduce its waiting time
> > > > > > > > > > > > > > > > for acquiring the global
> > > > > > cache lock.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > However,  this mechanism just ensures the
> > > > > > > > > > > > > > > > performance unchanged for the first few mbufs.
> > > > > > > > > > > > > > > > During the next mbufs lkey searching after
> > 'dev_gen'
> > > > > > > > > > > > > > > > updated, it is still necessary to update
> > > > > > > > > > > > > > > > the local
> > cache.
> > > > > > > > > > > > > > > > And the performance can firstly reduce and
> > > > > > > > > > > > > > > > then
> > > > returns.
> > > > > > > > > > > > > > > > Thus, no matter whether there is this
> > > > > > > > > > > > > > > > patch or not, the performance will jitter
> > > > > > > > > > > > > > > > in a certain period of
> > > > > > > > > > > > > time.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Local cache should be updated to remove MRs
> > > > > > > > > > > > > > > no longer
> > > > > > valid.
> > > > > > > > > > > > > > > But we just flush the entire cache.
> > > > > > > > > > > > > > > Let's suppose we have valid MR0, MR1, and
> > > > > > > > > > > > > > > not valid MRX in local
> > > > > > > > > > > cache.
> > > > > > > > > > > > > > > And there are traffic in the datapath for
> > > > > > > > > > > > > > > MR0 and MR1, and no traffic for MRX anymore.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) If we do as you propose:
> > > > > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > > > > b) request flush local cache first - all
> > > > > > > > > > > > > > > MR0, MR1, MRX will be removed on translation
> > > > > > > > > > > > > > > in datapath
> > > > > > > > > > > > > > > c) update global cache,
> > > > > > > > > > > > > > > d) free lock All the traffic for valid MR0,
> > > > > > > > > > > > > > > MR1 ALWAYS will be blocked on lock taken for
> > > > > > > > > > > > > > > cache update since point
> > > > > > > > > > > > > > > b) till point
> > > > > > > d).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2) If we do as it is implemented now:
> > > > > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > > > > b) update global cache
> > > > > > > > > > > > > > > c) request flush local cache
> > > > > > > > > > > > > > > d) free lock The traffic MIGHT be locked
> > > > > > > > > > > > > > > ONLY for MRs non-existing in local cache
> > > > > > > > > > > > > > > (not happens for
> > > > > > > > > > > > > > > MR0 and MR1, must not happen for MRX), and
> > > > > > > > > > > > > > > probability should be minor. And lock might
> > > > > > > > > > > > > > > happen since
> > > > > > > > > > > > > > > c) till
> > > > > > > > > > > > > > > d)
> > > > > > > > > > > > > > > - quite short period of time
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Summary, the difference between 1) and 2)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lock probability:
> > > > > > > > > > > > > > > - 1) lock ALWAYS happen for ANY MR
> > > > > > > > > > > > > > > translation after
> > > b),
> > > > > > > > > > > > > > >   2) lock MIGHT happen, for cache miss ONLY,
> > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > c)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lock duration:
> > > > > > > > > > > > > > > - 1) lock since b) till d),
> > > > > > > > > > > > > > >   2) lock since c) till d), that seems to be
> > > > > > > > > > > > > > > much
> > shorter.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Finally, in conclusion, I tend to think
> > > > > > > > > > > > > > > > that the bottom layer can do more things
> > > > > > > > > > > > > > > > to ensure the correct execution of the
> > > > > > > > > > > > > > > > program, which may have a negative impact
> > > > > > > > > > > > > > > > on the performance in a short time, but in
> > > > > > > > > > > > > > > > the long run, the performance will
> > > > > > > > > > > > > > > > eventually
> > > > > > > > > > > > come back.
> > > > > > > > > > > > > > > > Furthermore, maybe we should pay attention
> > > > > > > > > > > > > > > > to the performance in the stable period,
> > > > > > > > > > > > > > > > and try our best to ensure the correctness
> > > > > > > > > > > > > > > > of the program in case of
> > > > > > > > > > > > > > > emergencies.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we have some mbufs still allocated in
> > > > > > > > > > > > > > > memory being freed
> > > > > > > > > > > > > > > - there is nothing to say about correctness,
> > > > > > > > > > > > > > > it is totally incorrect. In my opinion, we
> > > > > > > > > > > > > > > should not think how to mitigate this
> > > > > > > > > > > > > > > incorrect behavior, we should not encourage
> > > > > > > > > > > > > > > application developers to follow the wrong
> > > > > > > > > > > > > > approaches.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > With best regards, Slava
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best Regards Feifei
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > With best regards, Slava


More information about the dev mailing list