[dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev driver

Maxime Coquelin maxime.coquelin at redhat.com
Tue Dec 22 09:48:44 CET 2020


Hi Chenbo,

Thanks for the detailed reply.

On 12/22/20 4:09 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Monday, December 21, 2020 8:02 PM
>> To: Xia, Chenbo <chenbo.xia at intel.com>; Thomas Monjalon <thomas at monjalon.net>;
>> David Marchand <david.marchand at redhat.com>
>> Cc: dev <dev at dpdk.org>; Stephen Hemminger <stephen at networkplumber.org>; Liang,
>> Cunming <cunming.liang at intel.com>; Lu, Xiuchun <xiuchun.lu at intel.com>; Li,
>> Miao <miao.li at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev
>> driver
>>
>>
>>
>> On 12/21/20 10:52 AM, Maxime Coquelin wrote:
>>> Hi Chenbo,
>>>
>>> On 12/19/20 7:11 AM, Xia, Chenbo wrote:
>>>> Hi David,
>>>>
>>>>> -----Original Message-----
>>>>> From: David Marchand <david.marchand at redhat.com>
>>>>> Sent: Friday, December 18, 2020 5:54 PM
>>>>> To: Xia, Chenbo <chenbo.xia at intel.com>
>>>>> Cc: dev <dev at dpdk.org>; Thomas Monjalon <thomas at monjalon.net>; Stephen
>>>>> Hemminger <stephen at networkplumber.org>; Liang, Cunming
>>>>> <cunming.liang at intel.com>; Lu, Xiuchun <xiuchun.lu at intel.com>; Li, Miao
>>>>> <miao.li at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
>>>>> Subject: Re: [PATCH 0/8] Introduce emudev library and iavf emudev driver
>>>>>
>>>>> On Fri, Dec 18, 2020 at 9:02 AM Chenbo Xia <chenbo.xia at intel.com> wrote:
>>>>>>
>>>>>> This series introduces a new device abstraction called emudev for
>>>>> emulated
>>>>>> devices. A new library (librte_emudev) is implemented. The first emudev
>>>>>> driver is also introduced, which emulates Intel Adaptive Virtual
>>>>> Function
>>>>>> (iavf) as a software network device.
>>>>>>
>>>>>> This series has a dependency on librte_vfio_user patch series:
>>>>>> http://patchwork.dpdk.org/cover/85389/
>>>>>>
>>>>>> Background & Motivation
>>>>>> -----------------------
>>>>>> The disaggregated/multi-process QEMU is using VFIO-over-socket/vfio-user
>>>>>> as the main transport mechanism to disaggregate IO services from QEMU.
>>>>>> Therefore, librte_vfio_user is introduced in DPDK to accommodate
>>>>>> emulated devices for high performance I/O. Although vfio-user library
>>>>>> provides possibility of emulating devices in DPDK, DPDK does not have
>>>>>> a device abstraction for emulated devices. A good device abstraction
>>>>> will
>>>>>> be useful for applications or high performance data path driver. With
>>>>>> this consideration, emudev library is designed and implemented. It also
>>>>>> make it possbile to keep modular design on emulated devices by
>>>>> implementing
>>>>>> data path related logic in a standalone driver (e.g., an ethdev driver)
>>>>>> and keeps the unrelated logic in the emudev driver.
>>>>>
>>>>> Since you mention performance, how does it compare to vhost-user/virtio?
>>>>
>>>> I think it depends on the device specification (i.e., how complex its data
>> path
>>>> handling is). A first try on iavf spec shows better performance than virtio
>>>> in our simple tests.
>>>
>>> That's interesting! How big is the performance difference? And how do
>>> we explain it?
>>>
>>> If there are improvements that could be done in the Virtio
>>> specification, it would be great to know and work on their
>>> implementations. It worries me a bit that every one is coming with
>>> his new device emulation every release, making things like live-
>>> migration difficult to achieve in the future.
>>
>> I did a quick review of the IAVF emudev driver to understand what other
>> factors than ring layout could explain a performance gain.
>>
>> My understanding is that part of the performance gain may come from
>> following things that are supported/implemented in Vhost-user backend
>> and not in IAVF driver:
>> 1. Memory hotplug. It seems the datapath is not safe against memory
>> hotplug in the VM, which causes the memory tables to be updated
>> asynchronously from the datapath. In order to support it in Vhost-user
>> library, we had to introduce locks to ensure the datapath isn't
>> accessing the shared memory while it is being remapped.
> 
> I think now it uses the similar way that vhost-user does.
> 
> First, in the vfio-user patch series, we introduce a callback lock_dp to lock
> the data path when messages like DMA MAP/UNMAP come. It will lock datapath
> in our data path driver.
> 
> Note that the data path handling is in our data path driver:
> http://patchwork.dpdk.org/cover/85500/
> 
> For modular design, iavf_emu driver emulates the device but the iavf back-end
> driver does data path handling.

My analysis was actually based on the data path driver series.

My point was that iavfbe_recv_pkts() and iavfbe_xmit_pkts() are not safe
against asynchronous changes like memory table updates.

As far as I can see, the access_lock aren't taken by these functions, so
if for example a memory table update happen during the these functions
execution, it could lead to undefined behaviour. Only things checked
there is whether the queue is enabled when entering the function, but
this value can be changed right after having being checked.

For example, in Vhost-user lib, we protected rte_vhost_dequeue_burst()
and rte_vhost_enqueue_burst() with a spinlock. Note that this spinlock
is per-virtqueue in order to avoid contention between the different
queues.

>>
>> 2. Live-migration. This feature does not seem supported in the driver,
>> as I could not find dirty pages tracking mechanism. On Vhost-user side,
>> supporting implies adding more branch conditions in the hot path, even
>> when it is not used.
> 
> Yes, we don't support this now in this version. And yes, when we support this
> feature, it will introduce complexity in data path.
> 
>>
>> 3. No IOMMU support. Same here, this is supported in Vhost-user library,
>> and adding its support in IAVF driver would introduce some more branches
>> in the hot path even when not used.
> 
> Yes, vIOMMU is not fully supported in vfio-user spec for now and I also agree
> when we have to support it, it will slow down the data path.
> 
>>
>> 4. Out of bound memory accesses checks. While
>> rte_iavf_emu_get_dma_vaddr() provides a way to ensure the full requested
>> length is mapped, the data path does not use it. It does not even ensure
> 
> In fact, it uses it 😊. I think you may miss our data path driver. Here's a 
> use: http://patchwork.dpdk.org/patch/85504/

Sorry, I was not clear. What I meant is that
rte_iavf_emu_get_dma_vaddr() is indeed used, but its outputs aren't
checked properly.

>> the translated address is non-NULL. It makes it trivial for a malicious
>> guest to make the hypervisor's vSwitch to crash by simply passing random
>> values as buffer's address and length. Fixing it is trivial, but it will
>> add several more checks and loops (if a buffer is spanned across two
>> pages) in the hot path.
> 
> I don't quite understand this one. First, rte_iavf_emu_get_dma_vaddr() is the
> only way to translate address. And I think this function will ensure the input
> address is valid. Looking at the vhost side, vhost_iova_to_vva() does similar
> things when vIOMMU is not used. Do I miss something? Just correct me if I am
> wrong.

I think rte_iavf_emu_get_dma_vaddr() is good, the issue is the way it is
used in iavfbe_recv_pkts() and iavfbe_xmit_pkts().

First the returned address (desc_addr) is not checked. So if the start
address of the buffer is out of guest memory ranges, the host app will
crash with a segmentation fault. It can happen in case of bug in the
guest driver, or with a maliciously crafted descriptor.

Then, rte_iavf_emu_get_dma_vaddr() len parameter is a pointer. The
caller has to pass the length of the buffer it wants to map. Once
rte_iavf_emu_get_dma_vaddr() is updated with the contiguous length that
is mapped.

The caller needs to check whether all requested length is mapped, and
loop until it is all mapped:
http://code.dpdk.org/dpdk/latest/source/lib/librte_vhost/virtio_net.c#L484

This is required because a buffer contiguous in IOVA (guest physical
address space in this case) is not necessarily contiguous in Host
virtual address space. Also, this is required to be safe against
untrusted guests, as a malicious guest could pass ~0ULL in the
descriptor buffer len of the Tx path to crash the host application or
overwrite its memory.

Currently, in the driver, descriptor buf len is passed to the DMA map
function, but its value is not checked afterward, so it only ensure
first byte is mapped. For Tx, 1 bytes length is requested at DMA map, so
also only first byte of the buffer is ensured to be mapped (if desc_addr
is not NULL, which is not checked).

>>
>> Other than that, there is for sure a performance gain due to all the
>> features Virtio-net supports that we have to check and handle in the
>> hotpath, like indirect descriptors or mergeable buffers for example.
> 
> I think iavf has similar features like indirect and mergeable to recv/xmit
> large pkts. But I believe there will be some feature difference between
> iavf and virtio/vhost.

Sure, but I meant it might not be necessary to implement all those
features (if they are optional in the spec) in your driver, just that in
the case of Vhost we have a legacy to support.

> I think you are correct that for this version, it's not fair to compare virtio/vhost
> with iavf back-end because iavf back-end has not supported some features, and besides,
> we have not optimized the data path of iavf back-end. We expect the performance of
> iavf back-end in the same level of virtio 1.1 and hopefully better because of the ring
> layout. But let's see when we can do complete performance analysis 😊.

Thanks!
Maxime

> Thanks!
> Chenbo
> 
>>
>> Best regards,
>> Maxime
>>
>>> Regards,
>>> Maxime
>>>
>>>> Thanks!
>>>> Chenbo
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> David Marchand
>>>>
> 



More information about the dev mailing list