[dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev driver

Xia, Chenbo chenbo.xia at intel.com
Wed Dec 23 06:28:17 CET 2020


Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> Sent: Tuesday, December 22, 2020 4:49 PM
> To: Xia, Chenbo <chenbo.xia at intel.com>; Thomas Monjalon <thomas at monjalon.net>;
> David Marchand <david.marchand at redhat.com>
> Cc: dev <dev at dpdk.org>; Stephen Hemminger <stephen at networkplumber.org>; Liang,
> Cunming <cunming.liang at intel.com>; Lu, Xiuchun <xiuchun.lu at intel.com>; Li,
> Miao <miao.li at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev
> driver
> 
> Hi Chenbo,
> 
> Thanks for the detailed reply.
> 
> On 12/22/20 4:09 AM, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> >> Sent: Monday, December 21, 2020 8:02 PM
> >> To: Xia, Chenbo <chenbo.xia at intel.com>; Thomas Monjalon
> <thomas at monjalon.net>;
> >> David Marchand <david.marchand at redhat.com>
> >> Cc: dev <dev at dpdk.org>; Stephen Hemminger <stephen at networkplumber.org>;
> Liang,
> >> Cunming <cunming.liang at intel.com>; Lu, Xiuchun <xiuchun.lu at intel.com>; Li,
> >> Miao <miao.li at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf
> emudev
> >> driver
> >>
> >>
> >>
> >> On 12/21/20 10:52 AM, Maxime Coquelin wrote:
> >>> Hi Chenbo,
> >>>
> >>> On 12/19/20 7:11 AM, Xia, Chenbo wrote:
> >>>> Hi David,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: David Marchand <david.marchand at redhat.com>
> >>>>> Sent: Friday, December 18, 2020 5:54 PM
> >>>>> To: Xia, Chenbo <chenbo.xia at intel.com>
> >>>>> Cc: dev <dev at dpdk.org>; Thomas Monjalon <thomas at monjalon.net>; Stephen
> >>>>> Hemminger <stephen at networkplumber.org>; Liang, Cunming
> >>>>> <cunming.liang at intel.com>; Lu, Xiuchun <xiuchun.lu at intel.com>; Li, Miao
> >>>>> <miao.li at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
> >>>>> Subject: Re: [PATCH 0/8] Introduce emudev library and iavf emudev driver
> >>>>>
> >>>>> On Fri, Dec 18, 2020 at 9:02 AM Chenbo Xia <chenbo.xia at intel.com> wrote:
> >>>>>>
> >>>>>> This series introduces a new device abstraction called emudev for
> >>>>> emulated
> >>>>>> devices. A new library (librte_emudev) is implemented. The first emudev
> >>>>>> driver is also introduced, which emulates Intel Adaptive Virtual
> >>>>> Function
> >>>>>> (iavf) as a software network device.
> >>>>>>
> >>>>>> This series has a dependency on librte_vfio_user patch series:
> >>>>>> http://patchwork.dpdk.org/cover/85389/
> >>>>>>
> >>>>>> Background & Motivation
> >>>>>> -----------------------
> >>>>>> The disaggregated/multi-process QEMU is using VFIO-over-socket/vfio-
> user
> >>>>>> as the main transport mechanism to disaggregate IO services from QEMU.
> >>>>>> Therefore, librte_vfio_user is introduced in DPDK to accommodate
> >>>>>> emulated devices for high performance I/O. Although vfio-user library
> >>>>>> provides possibility of emulating devices in DPDK, DPDK does not have
> >>>>>> a device abstraction for emulated devices. A good device abstraction
> >>>>> will
> >>>>>> be useful for applications or high performance data path driver. With
> >>>>>> this consideration, emudev library is designed and implemented. It also
> >>>>>> make it possbile to keep modular design on emulated devices by
> >>>>> implementing
> >>>>>> data path related logic in a standalone driver (e.g., an ethdev driver)
> >>>>>> and keeps the unrelated logic in the emudev driver.
> >>>>>
> >>>>> Since you mention performance, how does it compare to vhost-user/virtio?
> >>>>
> >>>> I think it depends on the device specification (i.e., how complex its
> data
> >> path
> >>>> handling is). A first try on iavf spec shows better performance than
> virtio
> >>>> in our simple tests.
> >>>
> >>> That's interesting! How big is the performance difference? And how do
> >>> we explain it?
> >>>
> >>> If there are improvements that could be done in the Virtio
> >>> specification, it would be great to know and work on their
> >>> implementations. It worries me a bit that every one is coming with
> >>> his new device emulation every release, making things like live-
> >>> migration difficult to achieve in the future.
> >>
> >> I did a quick review of the IAVF emudev driver to understand what other
> >> factors than ring layout could explain a performance gain.
> >>
> >> My understanding is that part of the performance gain may come from
> >> following things that are supported/implemented in Vhost-user backend
> >> and not in IAVF driver:
> >> 1. Memory hotplug. It seems the datapath is not safe against memory
> >> hotplug in the VM, which causes the memory tables to be updated
> >> asynchronously from the datapath. In order to support it in Vhost-user
> >> library, we had to introduce locks to ensure the datapath isn't
> >> accessing the shared memory while it is being remapped.
> >
> > I think now it uses the similar way that vhost-user does.
> >
> > First, in the vfio-user patch series, we introduce a callback lock_dp to
> lock
> > the data path when messages like DMA MAP/UNMAP come. It will lock datapath
> > in our data path driver.
> >
> > Note that the data path handling is in our data path driver:
> > http://patchwork.dpdk.org/cover/85500/
> >
> > For modular design, iavf_emu driver emulates the device but the iavf back-
> end
> > driver does data path handling.
> 
> My analysis was actually based on the data path driver series.
> 
> My point was that iavfbe_recv_pkts() and iavfbe_xmit_pkts() are not safe
> against asynchronous changes like memory table updates.
> 
> As far as I can see, the access_lock aren't taken by these functions, so
> if for example a memory table update happen during the these functions
> execution, it could lead to undefined behaviour. Only things checked
> there is whether the queue is enabled when entering the function, but
> this value can be changed right after having being checked.
> 
> For example, in Vhost-user lib, we protected rte_vhost_dequeue_burst()
> and rte_vhost_enqueue_burst() with a spinlock. Note that this spinlock
> is per-virtqueue in order to avoid contention between the different
> queues.
> 

Oops, I didn't realize the data path driver missed that. And yes, the data path
driver should do that like you said.

> >>
> >> 2. Live-migration. This feature does not seem supported in the driver,
> >> as I could not find dirty pages tracking mechanism. On Vhost-user side,
> >> supporting implies adding more branch conditions in the hot path, even
> >> when it is not used.
> >
> > Yes, we don't support this now in this version. And yes, when we support
> this
> > feature, it will introduce complexity in data path.
> >
> >>
> >> 3. No IOMMU support. Same here, this is supported in Vhost-user library,
> >> and adding its support in IAVF driver would introduce some more branches
> >> in the hot path even when not used.
> >
> > Yes, vIOMMU is not fully supported in vfio-user spec for now and I also
> agree
> > when we have to support it, it will slow down the data path.
> >
> >>
> >> 4. Out of bound memory accesses checks. While
> >> rte_iavf_emu_get_dma_vaddr() provides a way to ensure the full requested
> >> length is mapped, the data path does not use it. It does not even ensure
> >
> > In fact, it uses it 😊. I think you may miss our data path driver. Here's a
> > use: http://patchwork.dpdk.org/patch/85504/
> 
> Sorry, I was not clear. What I meant is that
> rte_iavf_emu_get_dma_vaddr() is indeed used, but its outputs aren't
> checked properly.
> 
> >> the translated address is non-NULL. It makes it trivial for a malicious
> >> guest to make the hypervisor's vSwitch to crash by simply passing random
> >> values as buffer's address and length. Fixing it is trivial, but it will
> >> add several more checks and loops (if a buffer is spanned across two
> >> pages) in the hot path.
> >
> > I don't quite understand this one. First, rte_iavf_emu_get_dma_vaddr() is
> the
> > only way to translate address. And I think this function will ensure the
> input
> > address is valid. Looking at the vhost side, vhost_iova_to_vva() does
> similar
> > things when vIOMMU is not used. Do I miss something? Just correct me if I am
> > wrong.
> 
> I think rte_iavf_emu_get_dma_vaddr() is good, the issue is the way it is
> used in iavfbe_recv_pkts() and iavfbe_xmit_pkts().
> 
> First the returned address (desc_addr) is not checked. So if the start
> address of the buffer is out of guest memory ranges, the host app will
> crash with a segmentation fault. It can happen in case of bug in the
> guest driver, or with a maliciously crafted descriptor.
> 
> Then, rte_iavf_emu_get_dma_vaddr() len parameter is a pointer. The
> caller has to pass the length of the buffer it wants to map. Once
> rte_iavf_emu_get_dma_vaddr() is updated with the contiguous length that
> is mapped.
> 
> The caller needs to check whether all requested length is mapped, and
> loop until it is all mapped:
> http://code.dpdk.org/dpdk/latest/source/lib/librte_vhost/virtio_net.c#L484
> 
> This is required because a buffer contiguous in IOVA (guest physical
> address space in this case) is not necessarily contiguous in Host
> virtual address space. Also, this is required to be safe against
> untrusted guests, as a malicious guest could pass ~0ULL in the
> descriptor buffer len of the Tx path to crash the host application or
> overwrite its memory.
> 
> Currently, in the driver, descriptor buf len is passed to the DMA map
> function, but its value is not checked afterward, so it only ensure
> first byte is mapped. For Tx, 1 bytes length is requested at DMA map, so
> also only first byte of the buffer is ensured to be mapped (if desc_addr
> is not NULL, which is not checked).
>

Correct! I also missed that rte_iavf_emu_get_dma_vaddr() is not properly used
in data path driver. For the lock issue and this, @Wu, Jingjing could we fix
in V2?

Thanks for the good catches!
Chenbo
 
> >>
> >> Other than that, there is for sure a performance gain due to all the
> >> features Virtio-net supports that we have to check and handle in the
> >> hotpath, like indirect descriptors or mergeable buffers for example.
> >
> > I think iavf has similar features like indirect and mergeable to recv/xmit
> > large pkts. But I believe there will be some feature difference between
> > iavf and virtio/vhost.
> 
> Sure, but I meant it might not be necessary to implement all those
> features (if they are optional in the spec) in your driver, just that in
> the case of Vhost we have a legacy to support.
> 
> > I think you are correct that for this version, it's not fair to compare
> virtio/vhost
> > with iavf back-end because iavf back-end has not supported some features,
> and besides,
> > we have not optimized the data path of iavf back-end. We expect the
> performance of
> > iavf back-end in the same level of virtio 1.1 and hopefully better because
> of the ring
> > layout. But let's see when we can do complete performance analysis 😊.
> 
> Thanks!
> Maxime
> 
> > Thanks!
> > Chenbo
> >
> >>
> >> Best regards,
> >> Maxime
> >>
> >>> Regards,
> >>> Maxime
> >>>
> >>>> Thanks!
> >>>> Chenbo
> >>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> David Marchand
> >>>>
> >



More information about the dev mailing list