[dpdk-dev] [PATCH v1 00/12] Support for ARM(v7)

Jan Viktorin viktorin at rehivetech.com
Sat Oct 3 10:58:06 CEST 2015


Dear DPDK community,

I am proposing a patch series with support of the ARMv7 architecture
for DPDK. The patch series does not introduce any PMD driver. It is
possible to compile it, boot it and test it with some virtual PMD (eg.
pcap). It is rebased on top of v2.1.0.

All but the last two patches (11, 12) are quite staightforward
and usually based on the ppc_64 architecture. Notes:

* we test on Cortex-A9 (mostly Xilinx Zynq at the moment)
* atomic operations and spinlocks are implemented by (GCC) intrinsics
* cpu cycle is implemented by clock_gettime because there is no
  standard 64-bit counter available
* we have to set -Wno-error to pass the build process because there are
  quite a lot of alignment problems reported (we didn't find any real issues
  so far)

The last two patches (11, 12) are not to be merged into mainline. They
are just a temporary workaround for the two libraries (ACL, LPM) which
heavily utilizes the SSE... It is not possible to easily convert the
SSE calls to the NEON SIMD operations.

============

It is important to note that the current Linux Kernel does not contain
the support for huge tables for non-LPAE ARM architectures (Cortex-A9).
There is a patch available on the Internet but it is not going to be
merged for now (4/2014):

 http://thread.gmane.org/gmane.linux.kernel.mm/115788

We ported this patch to 3.18 and it can improve the performance. Here
follow results for our tests of several algorithms showing the execution
time reduction:

CPU median 3x3        -  0.2 %
NEON median 3x3       - 19.5 %
Random read           -  0.0 %
Random write          -  6.2 %
Matrix multiplication - 31.0 %
NEON copy             -  4.2 %

============

We are working on the PMD + kernel-support part. At the moment, we have
a working PMD for Xilinx Zynq's EMAC. However, it uses some dirty features.
We have to rethink it a bit before going to the mainline. We are facing some
problems during the implementation (some are already being solved in the
mailing-list):

* rte_eth_dev is defined as a PCI device. As ARMs are SoCs with integrated
  EMAC on the chip and an external phyter, we need a different approach.
  There can be an ARM computer with PCI-E but then you put there a network
  card and use a different kind of driver (but this is not very common
  at the moment).
* ARM does not have coherent memory for DMA transfers. It is possible to
  allocate non-cachable memory (DMA transfers can be as fast as possible)
  but it slows down the payload processing on CPU. For this purpose, we
  have to call dma_map/unmap_* in kernel. A custom kernel driver is needed
  and it should not be the UIO because it is quite limited (almost
  non-extendable mmap, no support for custom ioctl and write).
* We are not going to put the PHY layer into userspace, so it will stay
  in the kernel. There is also a need for the CLK control (clock gating)
  in the PMD.

Regards
Jan Viktorin


Jan Viktorin (2):
  eal/arm: rwlock support for ARM
  gcc/arm: avoid alignment errors to break build

Vlastimil Kosar (10):
  mk: Introduce ARMv7 architecture
  eal/arm: atomic operations for ARM
  eal/arm: byte order operations for ARM
  eal/arm: cpu cycle operations for ARM
  eal/arm: prefetch operations for ARM
  eal/arm: spinlock operations for ARM (without HTM)
  eal/arm: vector memcpy for ARM
  eal/arm: cpu flag checks for ARM
  lpm/arm: implement rte_lpm_lookupx4 using rte_lpm_lookup_bulk on
    for-x86
  arm: Disable usage of SSE optimized code in librte_acl

 app/test/test_cpuflags.c                           |   5 +
 config/defconfig_arm-armv7-a-linuxapp-gcc          |  72 ++++++
 lib/librte_acl/acl.h                               |   2 +
 lib/librte_acl/rte_acl.c                           |   8 +-
 lib/librte_acl/rte_acl_osdep.h                     |   2 +
 .../common/include/arch/arm/rte_atomic.h           | 257 ++++++++++++++++++++
 .../common/include/arch/arm/rte_byteorder.h        | 148 +++++++++++
 .../common/include/arch/arm/rte_cpuflags.h         | 169 +++++++++++++
 .../common/include/arch/arm/rte_cycles.h           |  85 +++++++
 .../common/include/arch/arm/rte_memcpy.h           | 270 +++++++++++++++++++++
 .../common/include/arch/arm/rte_prefetch.h         |  61 +++++
 .../common/include/arch/arm/rte_rwlock.h           |  40 +++
 .../common/include/arch/arm/rte_spinlock.h         | 114 +++++++++
 lib/librte_lpm/rte_lpm.h                           |  71 ++++++
 mk/arch/arm/rte.vars.mk                            |  39 +++
 mk/machine/armv7-a/rte.vars.mk                     |  60 +++++
 mk/rte.cpuflags.mk                                 |   6 +
 mk/toolchain/gcc/rte.vars.mk                       |   6 +
 18 files changed, 1414 insertions(+), 1 deletion(-)
 create mode 100644 config/defconfig_arm-armv7-a-linuxapp-gcc
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
 create mode 100644 mk/arch/arm/rte.vars.mk
 create mode 100644 mk/machine/armv7-a/rte.vars.mk

-- 
2.5.2



More information about the dev mailing list