[dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER

Shahaf Shuler shahafs at mellanox.com
Tue Mar 19 20:42:21 CET 2019


Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> 
> Guys, please let's avoid top-post.
> 
> You are both not replying to each other:
> 
> 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> protection with lwsync.
> We assume that rte_mb should protect any access including IO.
> 
> 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> did not reply about the semantic: are we protecting IO or general memory
> access?

In mlx5 code we want to sync between two different writes:
1. write to system memory (RAM)
2. write to MMIO memory (device)

We need #1 to be visible on host memory before #2 is committed to NIC.
We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.

So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
It is also mentioned in the barrier doxygen on ARM arch:
"
Write memory barrier.                                            
                                                                 
Guarantees that the STORE operations generated before the barrier
occur before the STORE operations generated after.
"

It doesn't restrict to store to system memory only. 
w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 

We need to align the definition of the different barriers in DPDK:
1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch. 
2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory. 

> 
> 
> 19/03/2019 11:05, Dekel Peled:
> > Hi,
> >
> > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> rte_rmb() are the same as rte_mb(), for store and load respectively.
> > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> rte_mb(), since using lwsync is incorrect for them.
> >
> > Regards,
> > Dekel
> >
> > > -----Original Message-----
> > > From: Chao Zhu <chaozhu at linux.vnet.ibm.com>
> > > Sent: Tuesday, March 19, 2019 5:24 AM
> > > To: Dekel Peled <dekelp at mellanox.com>
> > > Cc: Yongseok Koh <yskoh at mellanox.com>; Shahaf Shuler
> > > <shahafs at mellanox.com>; dev at dpdk.org; Ori Kam
> <orika at mellanox.com>;
> > > Thomas Monjalon <thomas at monjalon.net>; stable at dpdk.org
> > > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > POWER
> > >
> > > Dekel£¬
> > >
> > > To control the memory order for device memory, I think you should
> > > use
> > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > rte_wmb() is used for system memory.
> > >
> > > > -----Original Message-----
> > > > From: Dekel Peled <dekelp at mellanox.com>
> > > > Sent: Monday, March 18, 2019 8:58 PM
> > > > To: chaozhu at linux.vnet.ibm.com
> > > > Cc: yskoh at mellanox.com; shahafs at mellanox.com; dev at dpdk.org;
> > > > orika at mellanox.com; thomas at monjalon.net; dekelp at mellanox.com;
> > > > stable at dpdk.org
> > > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > > POWER
> > > >
> > > > From previous patch description: "to improve performance on PPC64,
> > > > use light weight sync instruction instead of sync instruction."
> > > >
> > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > This form is used to control ordering for storage accesses to
> > > > system memory only. It does not create a memory barrier for
> > > > accesses to device
> > > memory."
> > > >
> > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > accesses to system memory and device memory.
> > > >
> > > > [1]
> > > >
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> > > w
> > > .
> > > >
> > >
> ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&amp
> > > ;data=
> > > >
> > >
> 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> > > c5%7Ca
> > > >
> > >
> 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> > > p;sdata
> > > >
> > >
> =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> > > served=0
> > > >
> > > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > > > Cc: stable at dpdk.org
> > > >
> > > > Signed-off-by: Dekel Peled <dekelp at mellanox.com>
> > > > ---
> > > >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8
> > > > --------
> > > >  1 file changed, 8 deletions(-)
> > > >
> > > > diff --git
> > > > a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > index ce38350..797381c 100644
> > > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > @@ -63,11 +63,7 @@
> > > >   * Guarantees that the STORE operations generated before the barrier
> > > >   * occur before the STORE operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  /**
> > > >   * Read memory barrier.
> > > > @@ -75,11 +71,7 @@
> > > >   * Guarantees that the LOAD operations generated before the barrier
> > > >   * occur before the LOAD operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  #define rte_smp_mb() rte_mb()
> 
> 



More information about the dev mailing list