[dpdk-dev] [PATCH v3 0/2] virtio support for container

Jianfeng Tan jianfeng.tan at intel.com
Thu Apr 21 04:56:34 CEST 2016


v3:
 - Remove --single-file option; do no change at EAL memory.
 - Remove the added API rte_eal_get_backfile_info(), instead we check all
   opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
 - Accordingly, add more restrictions at "Known issue" section.
 - Rename parameter from queue_num to queue_size for confusion.
 - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
 - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
   reuse eth_virtio_dev_init(), remove its static declaration.
 - Implement dev_uninit() for rte_eth_dev_detach().
 - WARN -> ERR, in vhost_embedded.c
 - Add more commit message for clarify the model.

v2:
 - Rebase on the patchset of virtio 1.0 support.
 - Fix cannot create non-hugepage memory.
 - Fix wrong size of memory region when "single-file" is used.
 - Fix setting of offset in virtqueue to use virtual address.
 - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
 - Add mac option to specify the mac address of this virtual device.
 - Update doc.

This patchset is to provide high performance networking interface (virtio)
for container-based DPDK applications. The way of starting DPDK apps in
containers with ownership of NIC devices exclusively is beyond the scope.
The basic idea here is to present a new virtual device (named eth_cvio),
which can be discovered and initialized in container-based DPDK apps using
rte_eal_init(). To minimize the change, we reuse already-existing virtio
frontend driver code (driver/net/virtio/).
 
Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
operations into unix socket/cuse protocol, which is originally provided in
QEMU), is integrated in virtio frontend driver. So this converged driver
actually plays the role of original frontend driver and the role of QEMU
device framework.
 
The major difference lies in how to calculate relative address for vhost.
The principle of virtio is that: based on one or multiple shared memory
segments, vhost maintains a reference system with the base addresses and
length for each segment so that an address from VM comes (usually GPA,
Guest Physical Address) can be translated into vhost-recognizable address
(named VVA, Vhost Virtual Address). To decrease the overhead of address
translation, we should maintain as few segments as possible. In VM's case,
GPA is always locally continuous. In container's case, CVA (Container
Virtual Address) can be used. Specifically:
a. when set_base_addr, CVA address is used;
b. when preparing RX's descriptors, CVA address is used;
c. when transmitting packets, CVA is filled in TX's descriptors;
d. in TX and CQ's header, CVA is used.
 
How to share memory? In VM's case, qemu always shares all physical layout
to backend. But it's not feasible for a container, as a process, to share
all virtual memory regions to backend. So only specified virtual memory
regions (with type of shared) are sent to backend. It's a limitation that
only addresses in these areas can be used to transmit or receive packets.

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

How to use?

a. Apply this patchset.

b. To compile container apps:
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc

c. To build a docker image using Dockerfile below.
$: cat ./Dockerfile
FROM ubuntu:latest
WORKDIR /usr/src/dpdk
COPY . /usr/src/dpdk
ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/"
$: docker build -t dpdk-app-l2fwd .

d. Used with vhost-user
$: ./examples/vhost/build/vhost-switch -c 3 -n 4 \
	--socket-mem 1024,1024 -- -p 0x1 --stats 1
$: docker run -i -t -v <path_to_vhost_unix_socket>:/var/run/usvhost \
	-v /dev/hugepages:/dev/hugepages \
	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
	--vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1

f. Used with vhost-net
$: modprobe vhost
$: modprobe vhost-net
$: docker run -i -t --privileged \
	-v /dev/vhost-net:/dev/vhost-net \
	-v /dev/net/tun:/dev/net/tun \
	-v /dev/hugepages:/dev/hugepages \
	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
	--vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1

By the way, it's not necessary to run in a container.

Signed-off-by: Huawei Xie <huawei.xie at intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
Acked-By: Neil Horman <nhorman at tuxdrver.com>

Jianfeng Tan (2):
  virtio/vdev: add embeded device emulation
  virtio/vdev: add a new vdev named eth_cvio

 config/common_linuxapp                   |    3 +
 doc/guides/nics/overview.rst             |   58 +-
 doc/guides/rel_notes/release_16_07.rst   |    4 +
 drivers/net/virtio/Makefile              |    4 +
 drivers/net/virtio/rte_eth_virtio_vdev.c | 1079 ++++++++++++++++++++++++++++++
 drivers/net/virtio/vhost.h               |  194 ++++++
 drivers/net/virtio/virtio_ethdev.c       |  134 ++--
 drivers/net/virtio/virtio_ethdev.h       |    2 +
 drivers/net/virtio/virtio_pci.h          |   14 +
 drivers/net/virtio/virtio_rxtx.c         |    5 +-
 drivers/net/virtio/virtio_rxtx_simple.c  |   13 +-
 drivers/net/virtio/virtqueue.h           |   10 +
 12 files changed, 1433 insertions(+), 87 deletions(-)
 create mode 100644 drivers/net/virtio/rte_eth_virtio_vdev.c
 create mode 100644 drivers/net/virtio/vhost.h

-- 
2.1.4



More information about the dev mailing list