[dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER

Pradeep Satyanarayana pradeep at us.ibm.com
Fri Mar 22 16:30:44 CET 2019



Thomas Monjalon <thomas at monjalon.net> wrote on 03/22/2019 01:49:03 AM:

> From: Thomas Monjalon <thomas at monjalon.net>
> To: Pradeep Satyanarayana <pradeep at us.ibm.com>, David Wilder
> <wilder at us.ibm.com>
> Cc: Shahaf Shuler <shahafs at mellanox.com>, Chao Zhu
> <chaozhu at linux.vnet.ibm.com>, Dekel Peled <dekelp at mellanox.com>,
> dev at dpdk.org, David Christensen <drc at ibm.com>, Ori Kam
> <orika at mellanox.com>, Yongseok Koh <yskoh at mellanox.com>,
> konstantin.ananyev at intel.com, ola.liljedahl at arm.com,
> honnappa.nagarahalli at arm.com, bruce.richardson at intel.com
> Date: 03/22/2019 01:49 AM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> We need to agree on the definitions.
> Please see below,
>
> 22/03/2019 02:40, Pradeep Satyanarayana:
> > Shahaf Shuler <shahafs at mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> > > Pradeep Satyanarayana <pradeep at us.ibm.com> wrote on Thu 3/21/2019
12:41
> > AM:
> > > >> > So far, when not running on power, we used the rte_wmb for that.
> > > >> On x86 and ARM systems it provided the needed guarantees.
> > > >> > It is also mentioned in the barrier doxygen on ARM arch:
> > > >> > "
> > > >> > Write memory barrier.
> > > >> >
> > > >> > Guarantees that the STORE operations generated before the
barrier
> > > >> > occur before the STORE operations generated after.
> > > >> > "
> > > >> >
> > > >> > It doesn't restrict to store to system memory only.
> > > >> > w/ power is on somewhat different and in fact rte_mb is
required.
> > > >> It obviously miss the point of those barrier if we will need to
use
> > > >> a different barrier based on the system arch.
> > > >> >
> > > >> > We need to align the definition of the different barriers in
DPDK:
> > > >> > 1. need a clear documentation of each. this should be global and
> > > >> not part of the specific implementation on each arch.
> > > >
> > > >A single approach may not work for all architectures. Power is
different
> > > >from others, so we need to be able to accommodate that. More
comments
> > below.
> > >
> > > it don't get this claim.
> > > It is ok to have some differences between the different arch, but
> > > here you implement a well-defined barrier - rte_wmb.
> > > if you see a need we can discuss to define a **new** barrier which
> > > sync STORE only to system memory, and will be able to utilize the
> > > lwsync command.
> > >
> > > >
> > > >> The global definition is in lib/librte_eal/common/include/
> > > generic/rte_atomic.h
> > > >>
> > > >> There are some copy/paste in Arm32 and PPC that I will remove.
> > > >>
> > > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> > > >> define a new type of barrier which will sync between both I/O and
> > > >> stores to systems memory.
> > > >>
> > > >> The basic memory barrier of DPDK does not mention
> > > >> a difference between I/O and system memory.
> > > >
> > > >In the case of Power, sync will cater to both I/O and system
> > > memory. However, that
> > > >is too big a hammer in all cases.
> > >
> > > rte_wmb requires such sync. you propose to have the wrong barrier in
> > > favor of performance.
> > > to mitigate this you can take my suggestion above and define a new,
> > > more lightweight one.
> > >
> > > >
> > > >> It is not explicit (yet) but I assume it is protecting both.
> > > >> So, in my opinion, we need to make it explicit in the doc,
> > > >> and fix the PPC implementation to comply with this definition.
> > > >>
> > > >> Anyway, I don't see any significant effort from IBM to move from
> > > >> the alpha support stage to a real Open Source support.
> > > >> PS: sending a mail every two months, to promise improvements, is
> > > not enough!
> > > >
> > >
> > > […]
> > >
> > > >
> > > >We should retain lwsync, should not be removed as discussed in here:
> > > >
> > > >https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__mails.dpdk.org_archives_dev_2019-2DMarch_126746.html&d=DwIFaQ&c=jf_iaSHvJObTbx-

> siA1ZOg&r=co4lCofxrQP11yIVMply-
> QYvsUyeKJkYY_jL1QVgeGA&m=SNGJjgGF8mHR9t-
>
ixYHznWvUoXvC3zlbm8q1vlS4x_s&s=TXCEGDEjCiUW1ug5kDwlfuUaqRMowGhpihjly5zEZp8&e=

> > >
> > > i don't agree.
> > > it is very clear the rte_wmb implementation in power is broken and
> > > we need to fix this right away before other customers will hit the
> > > same issue.
> >
> >
> > In the DPDK source I see a couple of different classes of memory
barriers.
> > I am
> > not clear on the usage of these in the drivers, but I would think the
> > guidelines
> > to be as shown below (for Power):
> >
> > - rte_[rw]mb (general memory barrier) --> should be lwsync
>
> This is what may be discussed.
> The assumption is that the general memory barrier should cover
> all cases (CPU caches, SMP and I/O).
> That's why we think it should "sync" for Power.

In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
and retain it as lwsync. Agreed?

>
> > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> > - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> >
> > lwsync is appropriate for cases where CPUs are accessing cacheable
memory
> > (i.e. Memory Coherence Required) while the sync instruction should be
used
> > in all other cases.
> >

Thanks
Pradeep
pradeep at us.ibm.com


More information about the dev mailing list