[dpdk-users] Dpdk poor performance on virtual machine

Hu, Xuekun xuekun.hu at intel.com
Sat Dec 24 08:06:28 CET 2016

Now your setup has a new thing, “macvtap”. I don’t know what’s the performance of using macvtap. I only know it has much worse perf than the “real” pci pass-through.

I also don’t know why you select such config for your setup, anonymous huge pages and macvtap. Any specific purpose?

I think you should get a baseline first, then to get how much perf dropped if using anonymous hugepages or macvtap。

1.      Baseline: real hugepage + real pci pass-through

2.      Anon hugepages vs hugepages

3.      Real pci pass-through vs. macvtap

From: edgar helmut [mailto:helmut.edgar100 at gmail.com]
Sent: Saturday, December 24, 2016 3:23 AM
To: Hu, Xuekun <xuekun.hu at intel.com>
Cc: Wiles, Keith <keith.wiles at intel.com>; users at dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I changed the setup but still performance are poor :( and I need your help to understand the root cause.
the setup is (sorry for long description):
(test equipment is pktgen using dpdk installed on a second physical machine coonected with 82599 NICs)
host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu 16.04, with 4 hugepages of 1G each.
hypervizor (kvm): QEMU emulator version 2.5.0
guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M each.
guest total memory is 2G and all of it is backed by the host with transparent hugepages (I can see the AnonHugePages consumed at guest creation). This memory includes the 512 hugepages for the testpmd application.
I pinned and isolated the guest's vcpus (using kernel option isolcapu), and could see clearly that the isolation functions well.

2 x 82599 NICs connected as passthrough using macvtap interfaces to the guest, so the guest receives and forwards packets from one interface to the second and vice versa.
at the guest I bind its interfaces using igb_uio.
the testpmd at guest starts dropping packets at about ~800mbps between both ports bi-directional using two vcpus for forwarding (one for the application management and two for forwarding).
at 1.2 gbps it drops a lot of packets.
the same testpmd configuration on the host (between both 82599 NICs) forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so it will be great if someone can share its configuration for a tested setup.
Any further idea will be highly appreciated.


On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100 at gmail.com<mailto:helmut.edgar100 at gmail.com>> wrote:
That's what I afraid.
In fact i need the host to back the entire guest's memory with hugepages.
I will find the way to do that and make the testing again.

On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu at intel.com<mailto:xuekun.hu at intel.com>> wrote:
You said VM’s memory was 6G, while transparent hugepages was only used ~4G (4360192KB). So some were mapped to 4K pages.

BTW, the memory used by transparent hugepage is not the hugepage you reserved in kernel boot option.

From: edgar helmut [mailto:helmut.edgar100 at gmail.com<mailto:helmut.edgar100 at gmail.com>]
Sent: Friday, December 16, 2016 1:24 AM
To: Hu, Xuekun
Cc: Wiles, Keith; users at dpdk.org<mailto:users at dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

in fact the vm was created with 6G RAM, its kernel boot args are defined with 4 hugepages of 1G each, though when starting the vm i noted that anonhugepages increased.
The relevant qemu process id is 6074, and the following sums the amount of allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2} '|awk '{s+=$1} END {print s}'
which results with 4360192
so not all the memory is backed with transparent hugepages though it is more than the amount of hugepages the guest supposed to boot with.
How can I be sure that the required 4G hugepages are really allocated?, and not, for example, only 2G out of the 4G are allocated (and the rest 2 are mapping of the default 4K)?


On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu at intel.com<mailto:xuekun.hu at intel.com>> wrote:
Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.

-----Original Message-----
From: users [mailto:users-bounces at dpdk.org<mailto:users-bounces at dpdk.org>] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users at dpdk.org<mailto:users at dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles at intel.com<mailto:keith.wiles at intel.com>> wrote:

> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100 at gmail.com<mailto:helmut.edgar100 at gmail.com>>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
> Regards,
> Keith

More information about the users mailing list