[dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems

Mordechay Haimovsky motih at mellanox.com
Mon Jul 2 12:39:31 CEST 2018


Inline


> -----Original Message-----
> From: Shahaf Shuler
> Sent: Monday, July 2, 2018 10:05 AM
> To: Mordechay Haimovsky <motih at mellanox.com>; Yongseok Koh
> <yskoh at mellanox.com>; Adrien Mazarguil <adrien.mazarguil at 6wind.com>
> Cc: dev at dpdk.org; Mordechay Haimovsky <motih at mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
> 
> Hi Moty,
> 
> Few nits,
> 
> Also please fix the check patch warning :
> ### net/mlx5: add support for 32bit systems
> 
> CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
> #235: FILE: drivers/net/mlx5/mlx5_rxtx.c:1591:
> +                       addr_64 = rte_cpu_to_be_64(
> 
> total: 0 errors, 0 warnings, 1 checks, 311 lines checked
> 
> 
Will Do.
Is there a way to know against which kernel DPDK tools are testing ?
Checkpatch does not show this error when testing with  4.14.0-0.rc4.git4.1.el7.x86_64 kernel for example.
> 
> Thursday, June 28, 2018 10:13 AM, Moti Haimovsky:
> > Subject: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
> >
> > This patch adds support for building and running mlx5 PMD on 32bit
> > systems such as i686.
> >
> > The main issue to tackle was handling the 32bit access to the UAR as
> > quoted from the mlx5 PRM:
> > QP and CQ DoorBells require 64-bit writes. For best performance, it is
> > recommended to execute the QP/CQ DoorBell as a single 64-bit write
> > operation. For platforms that do not support 64 bit writes, it is
> > possible to issue the 64 bits DoorBells through two consecutive
> > writes, each write 32 bits, as described below:
> > * The order of writing each of the Dwords is from lower to upper
> >   addresses.
> > * No other DoorBell can be rung (or even start ringing) in the midst of
> >   an on-going write of a DoorBell over a given UAR page.
> > The last rule implies that in a multi-threaded environment, the access
> > to a UAR page (which can be accessible by all threads in the process)
> > must be synchronized (for example, using a semaphore) unless an atomic
> > write of 64 bits in a single bus operation is guaranteed. Such a
> > synchronization is not required for when ringing DoorBells on different UAR
> pages.
> >
> > Signed-off-by: Moti Haimovsky <motih at mellanox.com>
> > ---
> >  doc/guides/nics/features/mlx5.ini |  1 +
> >  doc/guides/nics/mlx5.rst          | 11 +++++++
> >  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >  drivers/net/mlx5/mlx5.h           |  5 +++
> >  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >  drivers/net/mlx5/mlx5_rxtx.h      | 69
> > ++++++++++++++++++++++++++++++++++++++-
> >  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >  9 files changed, 137 insertions(+), 16 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/mlx5.ini
> > b/doc/guides/nics/features/mlx5.ini
> > index e75b14b..b28b43e 100644
> > --- a/doc/guides/nics/features/mlx5.ini
> > +++ b/doc/guides/nics/features/mlx5.ini
> > @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >  Other kdrv           = Y
> >  ARMv8                = Y
> >  Power8               = Y
> > +x86-32               = Y
> >  x86-64               = Y
> >  Usage doc            = Y
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 7dd9c1c..cb9d5d8 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -50,6 +50,8 @@ Features
> >  --------
> >
> >  - Multi arch support: x86_64, POWER8, ARMv8.
> > +- Support for i686 is available only when working with
> > +  rdma-core version 18.0 or above, built with 32bit support.
> 
> I think we can just add i686 to the supported arch. The limitation on the
> rdma-core version is well documented below.
> 
Will change this

> >  - Multiple TX and RX queues.
> >  - Support for scattered TX and RX frames.
> >  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> queues.
> > @@ -136,6 +138,11 @@ Limitations
> >    enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is
> > not fully
> >    supported. Some Rx packets may not have PKT_RX_RSS_HASH.
> >
> > +- Building for i686 is only supported with:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> > +  - Kernel version 4.14.41 or above.
> 
> Why the kernel is related? The rdma-core I understand.
> 
There was a patch  added to the kernel that fixed broken 32bit support
f2e9bfac13c9 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
(SHA may have changed)
This patch was added to kernel 4.17 and backported to 4.14.41

> > +
> >  Statistics
> >  ----------
> >
> > @@ -477,6 +484,10 @@ RMDA Core with Linux Kernel
> >  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> > `Linux installation documentation`_)
> >  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> > request #227 from yishaih/tm")
> >    (see `RDMA Core installation documentation`_)
> > +- When building for i686 use:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> > +  - Kernel version 4.14.41 or above.
> >
> >  .. _`Linux installation documentation`:
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> > kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux-
> > stable.git%2Fplain%2FDocumentation%2Fadmin-
> >
> guide%2FREADME.rst&data=02%7C01%7Cshahafs%40mellanox.com%7C3793
> >
> 359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> >
> C0%7C0%7C636657668016130861&sdata=yFHd7tQET5SqIcPgj66BSuwJp3sydo
> > ujC0ldCMkChVE%3D&reserved=0
> >  .. _`RDMA Core installation documentation`:
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fra
> > w.githubusercontent.com%2Flinux-rdma%2Frdma-
> >
> core%2Fmaster%2FREADME.md&data=02%7C01%7Cshahafs%40mellanox.co
> >
> m%7C3793359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d1492
> >
> 56f461b%7C0%7C0%7C636657668016130861&sdata=4LNh%2Fr5vM4BJeizvEIxi
> > ShMrfcx0NrlBFWz4V2wA%2FkY%3D&reserved=0
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > f0e6ed7..5d0f706 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -567,7 +567,7 @@
> >  	rte_memseg_walk(find_lower_va_bound, &addr);
> >
> >  	/* keep distance to hugepages to minimize potential conflicts. */
> > -	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
> > +	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET +
> > +MLX5_UAR_SIZE));
> >  	/* anonymous mmap, no real memory consumption. */
> >  	addr = mmap(addr, MLX5_UAR_SIZE,
> >  		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> @@ -953,6
> > +953,12 @@
> >  		priv->port = port;
> >  		priv->pd = pd;
> >  		priv->mtu = ETHER_MTU;
> > +#ifndef RTE_ARCH_64
> > +		/* Initialize UAR access locks for 32bit implementations. */
> > +		rte_spinlock_init(&priv->uar_lock_cq);
> > +		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
> > +			rte_spinlock_init(&priv->uar_lock[i]);
> > +#endif
> >  		err = mlx5_args(&config, pci_dev->device.devargs);
> >  		if (err) {
> >  			err = rte_errno;
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > 997b04a..2da32cd 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -198,6 +198,11 @@ struct priv {
> >  	/* Context for Verbs allocator. */
> >  	int nl_socket; /* Netlink socket. */
> >  	uint32_t nl_sn; /* Netlink message sequence number. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
> > +	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
> > +	/* UAR same-page access control required in 32bit implementations.
> > */
> > +#endif
> >  };
> >
> >  #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git
> > a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index
> > 5bbbec2..f6ec415 100644
> > --- a/drivers/net/mlx5/mlx5_defs.h
> > +++ b/drivers/net/mlx5/mlx5_defs.h
> > @@ -87,14 +87,28 @@
> >  #define MLX5_LINK_STATUS_TIMEOUT 10
> >
> >  /* Reserved address space for UAR mapping. */ -#define MLX5_UAR_SIZE
> > (1ULL << 32)
> > +#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
> >
> >  /* Offset of reserved UAR address space to hugepage memory. Offset is
> > used here
> >   * to minimize possibility of address next to hugepage being used by
> > other code
> >   * in either primary or secondary process, failing to map TX UAR
> > would make TX
> >   * packets invisible to HW.
> >   */
> > -#define MLX5_UAR_OFFSET (1ULL << 32)
> > +#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
> > +
> > +/* Maximum number of UAR pages used by a port,
> > + * These are the size and mask for an array of mutexes used to
> > +synchronize
> > + * the access to port's UARs on platforms that do not support 64 bit writes.
> > + * In such systems it is possible to issue the 64 bits DoorBells
> > +through two
> > + * consecutive writes, each write 32 bits. The access to a UAR page
> > +(which can
> > + * be accessible by all threads in the process) must be synchronized
> > + * (for example, using a semaphore). Such a synchronization is not
> > +required
> > + * when ringing DoorBells on different UAR pages.
> > + * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are
> > +shared
> > + * among the ports.
> > + */
> > +#define MLX5_UAR_PAGE_NUM_MAX 64
> > +#define MLX5_UAR_PAGE_NUM_MASK
> ((MLX5_UAR_PAGE_NUM_MAX)
> > - 1)
> >
> >  /* Log 2 of the default number of strides per WQE for Multi-Packet
> > RQ. */ #define MLX5_MPRQ_STRIDE_NUM_N 6U diff --git
> > a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> > 08dd559..820048f 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -643,7 +643,8 @@
> >  	doorbell = (uint64_t)doorbell_hi << 32;
> >  	doorbell |=  rxq->cqn;
> >  	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> > -	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
> > +	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
> > +			 cq_db_reg, rxq->uar_lock_cq);
> >  }
> >
> >  /**
> > @@ -1445,6 +1446,9 @@ struct mlx5_rxq_ctrl *
> >  	tmpl->rxq.elts_n = log2above(desc);
> >  	tmpl->rxq.elts =
> >  		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
> > +#ifndef RTE_ARCH_64
> > +	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq; #endif
> >  	tmpl->idx = idx;
> >  	rte_atomic32_inc(&tmpl->refcnt);
> >  	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next); diff --git
> > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
> > a7ed8d8..ec35ea0 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > @@ -495,6 +495,7 @@
> >  	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
> >  	unsigned int segs_n = 0;
> >  	const unsigned int max_inline = txq->max_inline;
> > +	uint64_t addr_64;
> >
> >  	if (unlikely(!pkts_n))
> >  		return 0;
> > @@ -711,12 +712,12 @@
> >  			ds = 3;
> >  use_dseg:
> >  			/* Add the remaining packet as a simple ds. */
> > -			addr = rte_cpu_to_be_64(addr);
> > +			addr_64 = rte_cpu_to_be_64(addr);
> >  			*dseg = (rte_v128u32_t){
> >  				rte_cpu_to_be_32(length),
> >  				mlx5_tx_mb2mr(txq, buf),
> > -				addr,
> > -				addr >> 32,
> > +				addr_64,
> > +				addr_64 >> 32,
> >  			};
> >  			++ds;
> >  			if (!segs_n)
> > @@ -750,12 +751,12 @@
> >  		total_length += length;
> >  #endif
> >  		/* Store segment information. */
> > -		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > uintptr_t));
> > +		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > uintptr_t));
> >  		*dseg = (rte_v128u32_t){
> >  			rte_cpu_to_be_32(length),
> >  			mlx5_tx_mb2mr(txq, buf),
> > -			addr,
> > -			addr >> 32,
> > +			addr_64,
> > +			addr_64 >> 32,
> >  		};
> >  		(*txq->elts)[++elts_head & elts_m] = buf;
> >  		if (--segs_n)
> > @@ -1450,6 +1451,7 @@
> >  	unsigned int mpw_room = 0;
> >  	unsigned int inl_pad = 0;
> >  	uint32_t inl_hdr;
> > +	uint64_t addr_64;
> >  	struct mlx5_mpw mpw = {
> >  		.state = MLX5_MPW_STATE_CLOSED,
> >  	};
> > @@ -1586,13 +1588,13 @@
> >  					((uintptr_t)mpw.data.raw +
> >  					 inl_pad);
> >  			(*txq->elts)[elts_head++ & elts_m] = buf;
> > -			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > -								 uintptr_t));
> > +			addr_64 = rte_cpu_to_be_64(
> > +					rte_pktmbuf_mtod(buf, uintptr_t));
> >  			*dseg = (rte_v128u32_t) {
> >  				rte_cpu_to_be_32(length),
> >  				mlx5_tx_mb2mr(txq, buf),
> > -				addr,
> > -				addr >> 32,
> > +				addr_64,
> > +				addr_64 >> 32,
> >  			};
> >  			mpw.data.raw = (volatile void *)(dseg + 1);
> >  			mpw.total_len += (inl_pad + sizeof(*dseg)); diff --git
> > a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index
> > 0007be0..2448d73 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -26,6 +26,8 @@
> >  #include <rte_common.h>
> >  #include <rte_hexdump.h>
> >  #include <rte_atomic.h>
> > +#include <rte_spinlock.h>
> > +#include <rte_io.h>
> >
> >  #include "mlx5_utils.h"
> >  #include "mlx5.h"
> > @@ -115,6 +117,10 @@ struct mlx5_rxq_data {
> >  	void *cq_uar; /* CQ user access region. */
> >  	uint32_t cqn; /* CQ number. */
> >  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t *uar_lock_cq;
> > +	/* CQ (UAR) access lock required for 32bit implementations */ #endif
> >  	uint32_t tunnel; /* Tunnel information. */  } __rte_cache_aligned;
> >
> > @@ -196,6 +202,10 @@ struct mlx5_txq_data {
> >  	volatile void *bf_reg; /* Blueflame register remapped. */
> >  	struct rte_mbuf *(*elts)[]; /* TX elements. */
> >  	struct mlx5_txq_stats stats; /* TX queue counters. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t *uar_lock;
> > +	/* UAR access lock required for 32bit implementations */ #endif
> >  } __rte_cache_aligned;
> >
> >  /* Verbs Rx queue elements. */
> > @@ -348,6 +358,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq,
> struct
> > rte_mbuf **pkts,  uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data
> > *rxq, uintptr_t addr);  uint32_t mlx5_tx_addr2mr_bh(struct
> > mlx5_txq_data *txq, uintptr_t addr);
> >
> > +/**
> > + * Provide safe 64bit store operation to mlx5 UAR region for both
> > +32bit and
> > + * 64bit architectures.
> > + *
> > + * @param val
> > + *   value to write in CPU endian format.
> > + * @param addr
> > + *   Address to write to.
> > + * @param lock
> > + *   Address of the lock to use for that UAR access.
> > + */
> > +static __rte_always_inline void
> > +__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
> > +			   rte_spinlock_t *lock __rte_unused) { #ifdef
> > RTE_ARCH_64
> > +	rte_write64_relaxed(val, addr);
> > +#else /* !RTE_ARCH_64 */
> > +	rte_spinlock_lock(lock);
> > +	rte_write32_relaxed(val, addr);
> > +	rte_io_wmb();
> > +	rte_write32_relaxed(val >> 32,
> > +			    (volatile void *)((volatile char *)addr + 4));
> > +	rte_spinlock_unlock(lock);
> > +#endif
> > +}
> > +
> > +/**
> > + * Provide safe 64bit store operation to mlx5 UAR region for both
> > +32bit and
> > + * 64bit architectures while guaranteeing the order of execution with
> > +the
> > + * code being executed.
> > + *
> > + * @param val
> > + *   value to write in CPU endian format.
> > + * @param addr
> > + *   Address to write to.
> > + * @param lock
> > + *   Address of the lock to use for that UAR access.
> > + */
> > +static __rte_always_inline void
> > +__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t
> > +*lock) {
> > +	rte_io_wmb();
> > +	__mlx5_uar_write64_relaxed(val, addr, lock); }
> > +
> > +/* Assist macros, used instead of directly calling the functions the
> > +wrap. */ #ifdef RTE_ARCH_64 #define mlx5_uar_write64_relaxed(val,
> > +dst,
> > +lock) \
> > +		__mlx5_uar_write64_relaxed(val, dst, NULL) #define
> > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
> > +#else #define mlx5_uar_write64_relaxed(val, dst, lock) \
> > +		__mlx5_uar_write64_relaxed(val, dst, lock) #define
> > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
> > +#endif
> > +
> >  #ifndef NDEBUG
> >  /**
> >   * Verify or set magic value in CQE.
> > @@ -614,7 +681,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> > rte_mbuf **pkts,
> >  	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
> >  	/* Ensure ordering between DB record and BF copy. */
> >  	rte_wmb();
> > -	*dst = *src;
> > +	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
> >  	if (cond)
> >  		rte_wmb();
> >  }
> > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> > index 669b913..dc786d4 100644
> > --- a/drivers/net/mlx5/mlx5_txq.c
> > +++ b/drivers/net/mlx5/mlx5_txq.c
> > @@ -255,6 +255,9 @@
> >  	struct mlx5_txq_ctrl *txq_ctrl;
> >  	int already_mapped;
> >  	size_t page_size = sysconf(_SC_PAGESIZE);
> > +#ifndef RTE_ARCH_64
> > +	unsigned int lock_idx;
> > +#endif
> >
> >  	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
> >  	/*
> > @@ -281,7 +284,7 @@
> >  		}
> >  		/* new address in reserved UAR address space. */
> >  		addr = RTE_PTR_ADD(priv->uar_base,
> > -				   uar_va & (MLX5_UAR_SIZE - 1));
> > +				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
> >  		if (!already_mapped) {
> >  			pages[pages_n++] = uar_va;
> >  			/* fixed mmap to specified address in reserved @@ -
> > 305,6 +308,12 @@
> >  		else
> >  			assert(txq_ctrl->txq.bf_reg ==
> >  			       RTE_PTR_ADD((void *)addr, off));
> > +#ifndef RTE_ARCH_64
> > +		/* Assign a UAR lock according to UAR page number */
> > +		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
> > +			   MLX5_UAR_PAGE_NUM_MASK;
> > +		txq->uar_lock = &priv->uar_lock[lock_idx]; #endif
> >  	}
> >  	return 0;
> >  }
> > @@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
> >  	rte_atomic32_inc(&txq_ibv->refcnt);
> >  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
> >  		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
> > +		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
> > +			dev->data->port_id, txq_ctrl->uar_mmap_offset);
> >  	} else {
> >  		DRV_LOG(ERR,
> >  			"port %u failed to retrieve UAR info, invalid"
> > --
> > 1.8.3.1



More information about the dev mailing list