[dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD

Gavin Hu Gavin.Hu at arm.com
Sat Apr 11 05:46:35 CEST 2020


Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko at solarflare.com>
> Sent: Saturday, April 11, 2020 1:21 AM
> To: Gavin Hu <Gavin.Hu at arm.com>; dev at dpdk.org
> Cc: nd <nd at arm.com>; david.marchand at redhat.com;
> thomas at monjalon.net; rasland at mellanox.com; drc at linux.vnet.ibm.com;
> bruce.richardson at intel.com; konstantin.ananyev at intel.com;
> matan at mellanox.com; shahafs at mellanox.com; viacheslavo at mellanox.com;
> jerinj at marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; Phil Yang <Phil.Yang at arm.com>; Joyce Kong
> <Joyce.Kong at arm.com>; Steve Capper <Steve.Capper at arm.com>
> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
> use it for mlx5 PMD
> 
> On 4/10/20 7:41 PM, Gavin Hu wrote:
> > To order writes to various memory types, 'sfence' is required for x86,
> > and 'dmb oshst' is required for aarch64.
> >
> > But within DPDK, there is no abstracted barriers covers this
> > combination: sfence(x86)/dmb(aarch64).
> >
> > So introduce a new barrier class - rte_dma_*mb for this combination,
> >
> > Doorbell rings are typical use cases of this new barrier class, which
> > requires something ready in the memory before letting HW aware.
> >
> > As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
> while
> > rte_wmb is 'dsb' for aarch64.
> 
> As far as I can see rte_cio_wmb() is exactly definition of the barrier
> to be used for doorbells. Am I missing something?

I understand rte_cio_wmb is for DMA buffers, for examples, descriptors, work queues, located in the host memory, but shared between CPU and IO device.
rte_io_wmb is for MMIO regions. 
We are missing the barriers for various memory types, eg. Doorbell cases.

There is an implication in the definition of rte_cio_wmb, it can not be used for non-coherent MMIO region(WC?)
http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/generic/rte_atomic.h#L124
> May be it is just a bug in rte_cio_wmb() on x86?
rte_cio_wmb is ok for doorbells on aarch64, but looking through the kernel code, 'sfence' is required for various/mixed memory types.
DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences of writes to host memory that shared by IO device.
Strengthening rte_cio_wmb may hurt performance, so a new barrier class is introduced to optimize for aarch64, in the fast path only, while not impacting x86.
http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1087
/Gavin
> 
> > In the joint preliminary testing between Arm and Ampere, 8%~13%
> > performance boost was measured.
> >
> > As there is no functionality changes, it will not impact x86.
> >
> > Gavin Hu (6):
> >    eal: introduce new class of barriers for DMA use cases
> >    net/mlx5: dmb for immediate doorbell ring on aarch64
> >    net/mlx5: relax barrier to order UAR writes on aarch64
> >    net/mlx5: relax barrier for aarch64
> >    net/mlx5: add descriptive comment for a barrier
> >    doc: clarify one configuration in mlx5 guide
> >
> > Phil Yang (1):
> >    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
> >
> >   doc/guides/nics/mlx5.rst                    |  6 ++--
> >   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
> >   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
> >   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
> >   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
> >   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
> >   lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
> >   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
> >   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
> >   9 files changed, 78 insertions(+), 15 deletions(-)
> >



More information about the dev mailing list