[dpdk-users] Dpdk poor performance on virtual machine

edgar helmut helmut.edgar100 at gmail.com
Sat Dec 24 09:06:08 CET 2016

I am looking for a mean to measure in and out packets to and from the vm
(without asking the vm itself). While pure passthrough doesn't expose an
interface to query for in/out pkts the macvtap exposes such an interface.
As for the anonymous hugepages I was looking for a more flexible method and
I assumed there is no much difference.
I will make the test with reserved hugepages.
However is there any knowledge about macvtap performance issues when
delivering 5-6 gbps?


On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu at intel.com> wrote:

Now your setup has a new thing, “macvtap”. I don’t know what’s the
performance of using macvtap. I only know it has much worse perf than the
“real” pci pass-through.

I also don’t know why you select such config for your setup, anonymous huge
pages and macvtap. Any specific purpose?

I think you should get a baseline first, then to get how much perf dropped
if using anonymous hugepages or macvtap。

1.      Baseline: real hugepage + real pci pass-through

2.      Anon hugepages vs hugepages

3.      Real pci pass-through vs. macvtap

*From:* edgar helmut [mailto:helmut.edgar100 at gmail.com]
*Sent:* Saturday, December 24, 2016 3:23 AM
*To:* Hu, Xuekun <xuekun.hu at intel.com>
*Cc:* Wiles, Keith <keith.wiles at intel.com>; users at dpdk.org

*Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine


I changed the setup but still performance are poor :( and I need your help
to understand the root cause.

the setup is (sorry for long description):

(test equipment is pktgen using dpdk installed on a second physical machine
coonected with 82599 NICs)

host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu
16.04, with 4 hugepages of 1G each.

hypervizor (kvm): QEMU emulator version 2.5.0

guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04

dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M

guest total memory is 2G and all of it is backed by the host with
transparent hugepages (I can see the AnonHugePages consumed at guest
creation). This memory includes the 512 hugepages for the testpmd

I pinned and isolated the guest's vcpus (using kernel option isolcapu), and
could see clearly that the isolation functions well.

2 x 82599 NICs connected as passthrough using macvtap interfaces to the
guest, so the guest receives and forwards packets from one interface to the
second and vice versa.

at the guest I bind its interfaces using igb_uio.

the testpmd at guest starts dropping packets at about ~800mbps between both
ports bi-directional using two vcpus for forwarding (one for the
application management and two for forwarding).

at 1.2 gbps it drops a lot of packets.

the same testpmd configuration on the host (between both 82599 NICs)
forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so
it will be great if someone can share its configuration for a tested setup.

Any further idea will be highly appreciated.


On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100 at gmail.com>

That's what I afraid.

In fact i need the host to back the entire guest's memory with hugepages.

I will find the way to do that and make the testing again.

On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu at intel.com> wrote:

You said VM’s memory was 6G, while transparent hugepages was only used ~4G
(4360192KB). So some were mapped to 4K pages.

BTW, the memory used by transparent hugepage is not the hugepage you
reserved in kernel boot option.

*From:* edgar helmut [mailto:helmut.edgar100 at gmail.com]
*Sent:* Friday, December 16, 2016 1:24 AM
*To:* Hu, Xuekun
*Cc:* Wiles, Keith; users at dpdk.org
*Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine

in fact the vm was created with 6G RAM, its kernel boot args are defined
with 4 hugepages of 1G each, though when starting the vm i noted that
anonhugepages increased.

The relevant qemu process id is 6074, and the following sums the amount of
allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
'|awk '{s+=$1} END {print s}'

which results with 4360192

so not all the memory is backed with transparent hugepages though it is
more than the amount of hugepages the guest supposed to boot with.

How can I be sure that the required 4G hugepages are really allocated?, and
not, for example, only 2G out of the 4G are allocated (and the rest 2 are
mapping of the default 4K)?


On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu at intel.com> wrote:

Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.

-----Original Message-----
From: users [mailto:users-bounces at dpdk.org] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users at dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles at intel.com> wrote:

> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100 at gmail.com>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without
> > and from a profile i made it looks like a dpdk application runs more
> > 10 times slower than over host…
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
> Regards,
> Keith

More information about the users mailing list