[dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER

Thomas Monjalon thomas at monjalon.net
Tue Mar 19 21:45:01 CET 2019


19/03/2019 20:42, Shahaf Shuler:
> Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > 
> > Guys, please let's avoid top-post.
> > 
> > You are both not replying to each other:
> > 
> > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> > protection with lwsync.
> > We assume that rte_mb should protect any access including IO.
> > 
> > 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> > did not reply about the semantic: are we protecting IO or general memory
> > access?
> 
> In mlx5 code we want to sync between two different writes:
> 1. write to system memory (RAM)
> 2. write to MMIO memory (device)
> 
> We need #1 to be visible on host memory before #2 is committed to NIC.
> We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.
> 
> So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
> It is also mentioned in the barrier doxygen on ARM arch:
> "
> Write memory barrier.                                            
>                                                                  
> Guarantees that the STORE operations generated before the barrier
> occur before the STORE operations generated after.
> "
> 
> It doesn't restrict to store to system memory only. 
> w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 
> 
> We need to align the definition of the different barriers in DPDK:
> 1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch.

The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h

There are some copy/paste in Arm32 and PPC that I will remove.

> 2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory.

The basic memory barrier of DPDK does not mention
a difference between I/O and system memory.
It is not explicit (yet) but I assume it is protecting both.
So, in my opinion, we need to make it explicit in the doc,
and fix the PPC implementation to comply with this definition.

Anyway, I don't see any significant effort from IBM to move from
the alpha support stage to a real Open Source support.
PS: sending a mail every two months, to promise improvements, is not enough!

-----------------

> > 19/03/2019 11:05, Dekel Peled:
> > > Hi,
> > >
> > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> > rte_rmb() are the same as rte_mb(), for store and load respectively.
> > > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> > rte_mb(), since using lwsync is incorrect for them.
> > >
> > > Regards,
> > > Dekel
> > >
> > > From: Chao Zhu <chaozhu at linux.vnet.ibm.com>
> > > > Dekel£¬
> > > >
> > > > To control the memory order for device memory, I think you should
> > > > use
> > > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > > rte_wmb() is used for system memory.
> > > >
> > > > From: Dekel Peled <dekelp at mellanox.com>
> > > > >
> > > > > From previous patch description: "to improve performance on PPC64,
> > > > > use light weight sync instruction instead of sync instruction."
> > > > >
> > > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > > This form is used to control ordering for storage accesses to
> > > > > system memory only. It does not create a memory barrier for
> > > > > accesses to device
> > > > memory."
> > > > >
> > > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > > accesses to system memory and device memory.





More information about the dev mailing list