[dpdk-dev] [PATCH v4 0/8] virtio support for container

Jianfeng Tan jianfeng.tan at intel.com
Fri Apr 29 03:18:28 CEST 2016


v4:
 - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to
   judge if it's virtual device or physical device.
 - Change the added device name to virtio-user.
 - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c,
   virtio_user_dev.c.
 - Move virtio-user specific data from struct virtio_hw into struct
   virtio_user_hw.
 - Add support to send reset_owner message.
 - Change del_queue implementation. (This need more check)
 - Remove rte_panic(), and superseded with log.
 - Add reset_owner into virtio_pci_ops.reset.
 - Merge parameter "rx" and "tx" to "queues" to emliminate confusion.
 - Move get_features to after set_owner.
 - Redefine path in virtio_user_hw from char * to char [].

v3:
 - Remove --single-file option; do no change at EAL memory.
 - Remove the added API rte_eal_get_backfile_info(), instead we check all
   opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
 - Accordingly, add more restrictions at "Known issue" section.
 - Rename parameter from queue_num to queue_size for confusion.
 - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
 - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
   reuse eth_virtio_dev_init(), remove its static declaration.
 - Implement dev_uninit() for rte_eth_dev_detach().
 - WARN -> ERR, in vhost_embedded.c
 - Add more commit message for clarify the model.

v2:
 - Rebase on the patchset of virtio 1.0 support.
 - Fix cannot create non-hugepage memory.
 - Fix wrong size of memory region when "single-file" is used.
 - Fix setting of offset in virtqueue to use virtual address.
 - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
 - Add mac option to specify the mac address of this virtual device.
 - Update doc.

This patchset is to provide high performance networking interface (virtio)
for container-based DPDK applications. The way of starting DPDK apps in
containers with ownership of NIC devices exclusively is beyond the scope.
The basic idea here is to present a new virtual device (named virtio-user),
which can be discovered and initialized by DPDK. To minimize the change,
we reuse already-existing virtio PMD code (driver/net/virtio/).

Background: Previously, we usually use a virtio device in the context of
QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually
presented in VM as a PCI device.

  ------------------
  |  virtio driver |  ----->  VM
  ------------------
        |
        | ----------> (over PCI bus or MMIO or Channel I/O)
        |
  ------------------
  | device emulate |
  |                |  ----->  QEMU
  | vhost adapter  |
  ------------------
        |
        | ----------> (vhost-user protocol or vhost-net ioctls)
        |
  ------------------
  | vhost backend  |
  ------------------
 
Compared to QEMU/VM case, virtio support for contaner requires to embedded
device framework inside the virtio PMD. So this converged driver actually
plays three roles:
  - virtio driver to drive this new kind of virtual device;
  - device emulation to present this virtual device and reponse to the
    virtio driver, which is originally by QEMU;
  - and the role to communicate with vhost backend, which is also
    originally by QEMU.

The code layout and functionality of each module:
 
  ----------------------
  | ------------------ |
  | | virtio driver  | |----> (virtio_user_pci.c)
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate |-|----> (virtio_user_dev.c)
  | |                | |
  | | vhost adapter  |-|----> (vhost_user.c, vhost_kernel.c, vhost.c)
  | ------------------ |
  ----------------------
         |
         | -------------- --> (vhost-user protocol or vhost-net ioctls)
         |
   ------------------
   | vhost backend  |
   ------------------

How to share memory? In VM's case, qemu always shares all physical layout
to backend. But it's not feasible for a container, as a process, to share
all virtual memory regions to backend. So only specified virtual memory
regions (with type of shared) are sent to backend. It's a limitation that
only addresses in these areas can be used to transmit or receive packets.

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

How to use?

a. Apply this patchset.

b. To compile container apps:
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc

c. To build a docker image using Dockerfile below.
$: cat ./Dockerfile
FROM ubuntu:latest
WORKDIR /usr/src/dpdk
COPY . /usr/src/dpdk
ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/"
$: docker build -t dpdk-app-l2fwd .

d. Used with vhost-user
$: ./examples/vhost/build/vhost-switch -c 3 -n 4 \
	--socket-mem 1024,1024 -- -p 0x1 --stats 1
$: docker run -i -t -v <path_to_vhost_unix_socket>:/var/run/usvhost \
	-v /dev/hugepages:/dev/hugepages \
	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
	--vdev=virtio-user0,path=/var/run/usvhost -- -p 0x1

f. Used with vhost-net
$: modprobe vhost
$: modprobe vhost-net
$: docker run -i -t --privileged \
	-v /dev/vhost-net:/dev/vhost-net \
	-v /dev/net/tun:/dev/net/tun \
	-v /dev/hugepages:/dev/hugepages \
	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
	--vdev=virtio-user0,path=/dev/vhost-net -- -p 0x1

By the way, it's not necessary to run in a container.

Signed-off-by: Huawei Xie <huawei.xie at intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
Acked-By: Neil Horman <nhorman at tuxdriver.com>

Jianfeng Tan (8):
  virtio: hide phys addr check inside pci ops
  virtio: abstract vring hdr desc init as a method
  virtio: enable use virtual address to fill desc
  virtio-user: add vhost adapter layer
  virtio-user: add device emulation layer APIs
  virtio-user: add new virtual pci driver for virtio
  virtio-user: add a new virtual device named virtio-user
  doc: update doc for virtio-user

 config/common_linuxapp                           |   3 +
 doc/guides/nics/overview.rst                     |  64 +--
 doc/guides/rel_notes/release_16_07.rst           |   4 +
 drivers/net/virtio/Makefile                      |   8 +
 drivers/net/virtio/virtio_ethdev.c               |  69 ++--
 drivers/net/virtio/virtio_ethdev.h               |   2 +
 drivers/net/virtio/virtio_pci.c                  |  30 +-
 drivers/net/virtio/virtio_pci.h                  |   3 +-
 drivers/net/virtio/virtio_rxtx.c                 |   5 +-
 drivers/net/virtio/virtio_rxtx_simple.c          |  13 +-
 drivers/net/virtio/virtio_user/vhost.c           | 105 +++++
 drivers/net/virtio/virtio_user/vhost.h           | 221 +++++++++++
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 254 ++++++++++++
 drivers/net/virtio/virtio_user/vhost_user.c      | 375 ++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 475 +++++++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  61 +++
 drivers/net/virtio/virtio_user/virtio_user_pci.c | 209 ++++++++++
 drivers/net/virtio/virtqueue.h                   |  33 +-
 18 files changed, 1849 insertions(+), 85 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost.h
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost_user.c
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.c
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.h
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_pci.c

-- 
2.1.4



More information about the dev mailing list