[dpdk-dev] [PATCH v6 00/11] implement packed virtqueues

Jens Freimann jfreimann at redhat.com
Fri Sep 21 16:06:44 CEST 2018


On Fri, Sep 21, 2018 at 08:32:22PM +0800, Tiwei Bie wrote:
>On Fri, Sep 21, 2018 at 12:32:57PM +0200, Jens Freimann wrote:
>> This is a basic implementation of packed virtqueues as specified in the
>> Virtio 1.1 draft. A compiled version of the current draft is available
>> at https://github.com/oasis-tcs/virtio-docs.git (or as .pdf at
>> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd10.pdf
>>
>> A packed virtqueue is different from a split virtqueue in that it
>> consists of only a single descriptor ring that replaces available and
>> used ring, index and descriptor buffer.
>>
>> Each descriptor is readable and writable and has a flags field. These flags
>> will mark if a descriptor is available or used.  To detect new available descriptors
>> even after the ring has wrapped, device and driver each have a
>> single-bit wrap counter that is flipped from 0 to 1 and vice versa every time
>> the last descriptor in the ring is used/made available.
>>
>> The idea behind this is to 1. improve performance by avoiding cache misses
>> and 2. be easier for devices to implement.
>>
>> Regarding performance: with these patches I get 21.13 Mpps on my system
>> as compared to 18.8 Mpps with the virtio 1.0 code. Packet size was 64
>
>Did you enable multiple-queue and use multiple cores on
>vhost side? If not, I guess the above performance gain
>is the gain in vhost side instead of virtio side.

I tested several variations back then and they all looked very good.
But code change a lot meanwhile and I need to do more benchmarking
in any case.

>
>If you use more cores on vhost side or virtio side, will
>you see any performance changes?
>
>Did you do any performance test with the kernel vhost-net
>backend (with zero-copy enabled and disabled)? I think we
>also need some performance data for these two cases. And
>it can help us to make sure that it works with the kernel
>backends.

I tested against vhost-kernel but only to test functionality not
to benchmark. 
>
>And for the "virtio-PMD + vhost-PMD" test cases, I think
>we need below performance data:
>
>#1. The maximum 1 core performance of virtio PMD when using split ring.
>#2. The maximum 1 core performance of virtio PMD when using packed ring.
>#3. The maximum 1 core performance of vhost PMD when using split ring.
>#4. The maximum 1 core performance of vhost PMD when using packed ring.
>
>And then we can have a clear understanding of the
>performance gain in DPDK with packed ring.
>
>And FYI, the maximum 1 core performance of virtio PMD
>can be got in below steps:
>
>1. Launch vhost-PMD with multiple queues, and use multiple
>   CPU cores for forwarding.
>2. Launch virtio-PMD with multiple queues and use 1 CPU
>   core for forwarding.
>3. Repeat above two steps with adding more CPU cores
>   for forwarding in vhost-PMD side until we can't see
>   performance increase anymore.

 Thanks for the suggestions, I'll come back with more
numbers.

>
>Besides, I just did a quick glance at the Tx implementation,
>it still assumes the descs will be written back in order
>by device. You can find more details from my comments on
>that patch.

Saw it and noted. I had hoped to be able to avoid the list but
I see no way around it now. 

Thanks for your review Tiwei!

regards,
Jens 


More information about the dev mailing list