[dpdk-dev] [RFC 0/5] virtio support for container

Jianfeng Tan jianfeng.tan at intel.com
Thu Nov 5 19:31:11 CET 2015

This patchset only acts as a PoC to request the community for comments.
This patchset is to provide high performance networking interface
(virtio) for container-based DPDK applications. The way of starting
DPDK applications in containers with ownership of NIC devices
exclusively is beyond the scope. The basic idea here is to present
a new virtual device (named eth_cvio), which can be discovered
and initialized in container-based DPDK applications rte_eal_init().
To minimize the change, we reuse already-existing virtio frontend
driver code (driver/net/virtio/).
Compared to QEMU/VM case, virtio device framework (translates I/O
port r/w operations into unix socket/cuse protocol, which is originally
provided in QEMU),  is integrated in virtio frontend driver. Aka, this
new converged driver actually plays the role of original frontend
driver and the role of QEMU device framework.
The biggest difference here lies in how to calculate relative address
for backend. The principle of virtio is that: based on one or multiple
shared memory segments, vhost maintains a reference system with
the base addresses and length of these segments so that an address
from VM comes (usually GPA, Guest Physical Address), vhost can
translate it into self-recognizable address (aka VVA, Vhost Virtual
Address). To decrease the overhead of address translation, we should
maintain as few segments as better. In the context of virtual machines,
GPA is always locally continuous. So it's a good choice. In container's
case, CVA (Container Virtual Address) can be used. This means that:
a. when set_base_addr, CVA address is used; b. when preparing RX's
descriptors, CVA address is used; c. when transmitting packets, CVA is
filled in TX's descriptors; d. in TX and CQ's header, CVA is used.
How to share memory? In VM's case, qemu always shares all physical
layout to backend. But it's not feasible for a container, as a process,
to share all virtual memory regions to backend. So only specified
virtual memory regions (type is shared) are sent to backend. It leads
to a limitation that only addresses in these areas can be used to
transmit or receive packets. For now, the shared memory is created
in /dev/shm using shm_open() in the memory initialization process.
How to use?
a. Apply the patch of virtio for container. We need two copies of
patched code (referred as dpdk-app/ and dpdk-vhost/)
b. To compile container apps:
$: cd dpdk-app
$: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=y")
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
c. To build a docker image using Dockerfile below.
$: cat ./Dockerfile
FROM ubuntu:latest
WORKDIR /usr/src/dpdk
COPY . /usr/src/dpdk
CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4", "--no-huge", "--no-pci", "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost", "--", "-p", "0x1"]
$: docker build -t dpdk-app-l2fwd .
d. To compile vhost:
$: cd dpdk-vhost
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
e. Start vhost-switch
$: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -- -p 0x1 --stats 1
f. Start docker
$: docker run -i -t -v <path to vhost unix socket>:/var/run/usvhost dpdk-app-l2fwd

Signed-off-by: Huawei Xie <huawei.xie at intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>

Jianfeng Tan (5):
  virtio/container: add handler for ioport rd/wr
  virtio/container: add a new virtual device named eth_cvio
  virtio/container: unify desc->addr assignment
  virtio/container: adjust memory initialization process
  vhost/container: change mode of vhost listening socket

 config/common_linuxapp                       |   5 +
 drivers/net/virtio/Makefile                  |   4 +
 drivers/net/virtio/vhost-user.c              | 433 +++++++++++++++++++++++++++
 drivers/net/virtio/vhost-user.h              | 137 +++++++++
 drivers/net/virtio/virtio_ethdev.c           | 319 +++++++++++++++-----
 drivers/net/virtio/virtio_ethdev.h           |  16 +
 drivers/net/virtio/virtio_pci.h              |  32 +-
 drivers/net/virtio/virtio_rxtx.c             |   9 +-
 drivers/net/virtio/virtio_rxtx_simple.c      |   9 +-
 drivers/net/virtio/virtqueue.h               |   9 +-
 lib/librte_eal/common/include/rte_memory.h   |   5 +
 lib/librte_eal/linuxapp/eal/eal_memory.c     |  58 +++-
 lib/librte_mempool/rte_mempool.c             |  16 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c |   5 +
 14 files changed, 967 insertions(+), 90 deletions(-)
 create mode 100644 drivers/net/virtio/vhost-user.c
 create mode 100644 drivers/net/virtio/vhost-user.h


