[dpdk-dev] [PATCH v4 00/10] VM Power Management
alan.carew at intel.com
Tue Oct 14 14:37:51 CEST 2014
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, October 13, 2014 9:26 PM
> To: Carew, Alan
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management
> Hi Alan,
> 2014-10-12 20:36, Alan Carew:
> > The following patches add two DPDK sample applications and an alternate
> > implementation of librte_power for use in virtualized environments.
> > The idea is to provide librte_power functionality from within a VM to address
> > the lack of MSRs to facilitate frequency changes from within a VM.
> > It is ideally suited for Haswell which provides per core frequency scaling.
> > The current librte_power affects frequency changes via the acpi-cpufreq
> > 'userspace' power governor, accessed via sysfs.
> Something was preventing me from looking deeper in this big codebase,
> but I didn't know what sounds weird.
> Now I realize: the real problem is that virtualization transparency is
> broken for power management. So the right thing to do is to fix it in
> KVM. I think all this patchset is a huge workaround.
> Did you try to fix it with Qemu/KVM?
When looking at the libvirt API it would seem to be a natural fit to have power management sitting there, so in essence I would agree.
However with a DPDK solution it would be possible to re-use the message bus to pass information like device stats, application state, D-state requests etc. to the host and allow for management layer(e.g. OpenStack) to make informed decisions.
Also, the scope of adding power management to qemu/KVM would be huge; while the easier path is not always the best and the problem of power management in VMs is both a DPDK problem (given that librte_power only worked on the host) and a general virtualization problem that would be better solved by those with direct knowledge of Qemu/KVM architecture and influence on the direction of the Qemu project.
As it stands, the host backend is simply an example application that can be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has obvious leanings to Qemu, but even this could be easily swapped out for XenBus, IVSHMEM, IP etc.
If power management is to be eventually supported by Hypervisors directly then we could also enable to option to switch to that environment, currently the librte_power implementations (VM or Host) can be selected dynamically(environment auto-detection) or explicitly via rte_power_set_env(), adding an arbitrary number of environments is relatively easy.
I hope this helps to clarify the approach.
More information about the dev