[dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
Jerin Jacob
jerinjacobk at gmail.com
Tue May 12 10:28:46 CEST 2020
On Tue, May 12, 2020 at 1:32 PM Ruifeng Wang <Ruifeng.Wang at arm.com> wrote:
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk at gmail.com>
> > Sent: Tuesday, May 12, 2020 2:42 PM
> > To: Ruifeng Wang <Ruifeng.Wang at arm.com>
> > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> > dev at dpdk.org; jerinj at marvell.com; hemant.agrawal at nxp.com; Ajit
> > Khaparde (ajit.khaparde at broadcom.com) <ajit.khaparde at broadcom.com>;
> > igorch at amazon.com; thomas at monjalon.net; viacheslavo at mellanox.com;
> > arybchenko at solarflare.com; nd <nd at arm.com>
> > Subject: Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
> >
> > On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <Ruifeng.Wang at arm.com>
> > wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > > > Sent: Tuesday, May 12, 2020 2:07 AM
> > > > To: dev at dpdk.org; jerinj at marvell.com; hemant.agrawal at nxp.com; Ajit
> > > > Khaparde (ajit.khaparde at broadcom.com)
> > <ajit.khaparde at broadcom.com>;
> > > > igorch at amazon.com; thomas at monjalon.net;
> > viacheslavo at mellanox.com;
> > > > arybchenko at solarflare.com; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli at arm.com>
> > > > Cc: Ruifeng Wang <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > > >
> > > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > > other-multi-copy atomicity memory model.
> > > >
> > > > Armv8-a memory model has been strengthened to require
> > > > other-multi-copy atomicity. This property requires memory accesses
> > > > from an observer to become visible to all other observers
> > > > simultaneously [3]. This means
> > > >
> > > > a) A write arriving at an endpoint shared between multiple CPUs is
> > > > visible to all CPUs
> > > > b) A write that is visible to all CPUs is also visible to all other
> > > > observers in the shareability domain
> > > >
> > > > This allows for using cheaper DMB instructions in the place of DSB
> > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to).
> > > >
> > > > Please refer to [1], [2] and [3] for more information.
> > > >
> > > > [1]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > > ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > > > ---
> > > > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > index 7b7099cdc..e406411bb 100644
> > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > @@ -19,11 +19,11 @@ extern "C" {
> > > > #include <rte_compat.h>
> > > > #include <rte_debug.h>
> > > >
> > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > > >
> > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > > >
> > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > > >
> > > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > > >
> > > > @@ -37,9 +37,9 @@ extern "C" {
> > > >
> > > > #define rte_io_rmb() rte_rmb()
> > > >
> > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > +#define rte_cio_wmb() rte_wmb()
> > > >
> > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > +#define rte_cio_rmb() rte_rmb()
> > > >
> > > > /*------------------------ 128 bit atomic operations
> > > > -------------------------*/
> > > >
> > > > --
> > > > 2.17.1
> > >
> > > This change showed about 7% performance gain in testpmd single core
> > NDR test.
> >
> > I am trying to understand this patch wrt DPDK current usage model?
> >
> > 1) Is performance improvement due to the fact that the PMD that you are
> > using it for testing suppose to use existing rte_cio_* but it was using
> > rte_[rw]mb?
>
> This is part of the reason. There are also cases where rte_io_* was used and can be relaxed.
> Such as: http://patches.dpdk.org/patch/68162/
>
> > 2) In my understanding :
> > a) CPU to CPU barrier requirements are addressed by rte_smp_*
> > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
> > c) CPU to ANY(CPU or Device) are addressed by rte_[rw]mb
> >
> > If (c) is true then we are violating the DPDK spec with change. Right?
>
> Developers are still required to use correct barrier APIs for different use cases.
> I think this change mitigates performance penalty when non optimal barrier is used.
But does it violate the contract? We are using rte_[rw]mb as a low
performance/heavyweight
for all the cases. I think that is the contract to DPDK consumers. For
different requirment,
We have a specific API. IMO, It makes sense to change the fastpath code for more
fine granted barriers based on the need rather than changing the
generic one to lightweight.
i.e rte_[rw]wb is the superset that works on all cases and use
customized one for the specific
use case.
>
> > This change will not be required if fastpath (CPU to Device) is using rte_cio_*.
> > Right?
>
> See 1). Correct usage of rte_cio_* is not the whole.
> For some other use cases, such as barrier between accesses of different memory types, we can also use lighter barrier 'dmb'.
>
> >
> >
> >
> > > Tested-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > >
More information about the dev
mailing list