[dpdk-dev] [PATCH v1 00/12] Support for ARM(v7)
Jan Viktorin
viktorin at rehivetech.com
Sat Oct 3 10:58:06 CEST 2015
Dear DPDK community,
I am proposing a patch series with support of the ARMv7 architecture
for DPDK. The patch series does not introduce any PMD driver. It is
possible to compile it, boot it and test it with some virtual PMD (eg.
pcap). It is rebased on top of v2.1.0.
All but the last two patches (11, 12) are quite staightforward
and usually based on the ppc_64 architecture. Notes:
* we test on Cortex-A9 (mostly Xilinx Zynq at the moment)
* atomic operations and spinlocks are implemented by (GCC) intrinsics
* cpu cycle is implemented by clock_gettime because there is no
standard 64-bit counter available
* we have to set -Wno-error to pass the build process because there are
quite a lot of alignment problems reported (we didn't find any real issues
so far)
The last two patches (11, 12) are not to be merged into mainline. They
are just a temporary workaround for the two libraries (ACL, LPM) which
heavily utilizes the SSE... It is not possible to easily convert the
SSE calls to the NEON SIMD operations.
============
It is important to note that the current Linux Kernel does not contain
the support for huge tables for non-LPAE ARM architectures (Cortex-A9).
There is a patch available on the Internet but it is not going to be
merged for now (4/2014):
http://thread.gmane.org/gmane.linux.kernel.mm/115788
We ported this patch to 3.18 and it can improve the performance. Here
follow results for our tests of several algorithms showing the execution
time reduction:
CPU median 3x3 - 0.2 %
NEON median 3x3 - 19.5 %
Random read - 0.0 %
Random write - 6.2 %
Matrix multiplication - 31.0 %
NEON copy - 4.2 %
============
We are working on the PMD + kernel-support part. At the moment, we have
a working PMD for Xilinx Zynq's EMAC. However, it uses some dirty features.
We have to rethink it a bit before going to the mainline. We are facing some
problems during the implementation (some are already being solved in the
mailing-list):
* rte_eth_dev is defined as a PCI device. As ARMs are SoCs with integrated
EMAC on the chip and an external phyter, we need a different approach.
There can be an ARM computer with PCI-E but then you put there a network
card and use a different kind of driver (but this is not very common
at the moment).
* ARM does not have coherent memory for DMA transfers. It is possible to
allocate non-cachable memory (DMA transfers can be as fast as possible)
but it slows down the payload processing on CPU. For this purpose, we
have to call dma_map/unmap_* in kernel. A custom kernel driver is needed
and it should not be the UIO because it is quite limited (almost
non-extendable mmap, no support for custom ioctl and write).
* We are not going to put the PHY layer into userspace, so it will stay
in the kernel. There is also a need for the CLK control (clock gating)
in the PMD.
Regards
Jan Viktorin
Jan Viktorin (2):
eal/arm: rwlock support for ARM
gcc/arm: avoid alignment errors to break build
Vlastimil Kosar (10):
mk: Introduce ARMv7 architecture
eal/arm: atomic operations for ARM
eal/arm: byte order operations for ARM
eal/arm: cpu cycle operations for ARM
eal/arm: prefetch operations for ARM
eal/arm: spinlock operations for ARM (without HTM)
eal/arm: vector memcpy for ARM
eal/arm: cpu flag checks for ARM
lpm/arm: implement rte_lpm_lookupx4 using rte_lpm_lookup_bulk on
for-x86
arm: Disable usage of SSE optimized code in librte_acl
app/test/test_cpuflags.c | 5 +
config/defconfig_arm-armv7-a-linuxapp-gcc | 72 ++++++
lib/librte_acl/acl.h | 2 +
lib/librte_acl/rte_acl.c | 8 +-
lib/librte_acl/rte_acl_osdep.h | 2 +
.../common/include/arch/arm/rte_atomic.h | 257 ++++++++++++++++++++
.../common/include/arch/arm/rte_byteorder.h | 148 +++++++++++
.../common/include/arch/arm/rte_cpuflags.h | 169 +++++++++++++
.../common/include/arch/arm/rte_cycles.h | 85 +++++++
.../common/include/arch/arm/rte_memcpy.h | 270 +++++++++++++++++++++
.../common/include/arch/arm/rte_prefetch.h | 61 +++++
.../common/include/arch/arm/rte_rwlock.h | 40 +++
.../common/include/arch/arm/rte_spinlock.h | 114 +++++++++
lib/librte_lpm/rte_lpm.h | 71 ++++++
mk/arch/arm/rte.vars.mk | 39 +++
mk/machine/armv7-a/rte.vars.mk | 60 +++++
mk/rte.cpuflags.mk | 6 +
mk/toolchain/gcc/rte.vars.mk | 6 +
18 files changed, 1414 insertions(+), 1 deletion(-)
create mode 100644 config/defconfig_arm-armv7-a-linuxapp-gcc
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
create mode 100644 mk/arch/arm/rte.vars.mk
create mode 100644 mk/machine/armv7-a/rte.vars.mk
--
2.5.2
More information about the dev
mailing list