[openstack-dev] [nova] Live migrations: Post-Copy and Auto-Converge features research
Daniel P. Berrange
berrange at redhat.com
Tue Nov 8 10:37:29 UTC 2016
On Mon, Nov 07, 2016 at 10:34:39PM +0200, Vlad Nykytiuk wrote:
> Hi,
>
> As you may know, currently QEMU supports several features that help live
> migrations to operate more predictively. These features are: auto-converge
> and post-copy.
> I made a research on performance characteristics of these two features, you
> can find it by the following link:
> https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
> <https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/>
Thanks for the report, it looks to affirm the results that I've got
previously that show post-copy as the clear winner, and auto-converge
a viable alternative if post-copy is not available.
I've got a few suggestions if you want to do further investigation
- Look at larger guests - a 1 vCPU guest with 2 GB of RAM is not
particularly difficult to migrate when you have 10 Gig-E networking
or even 1 Gig-E networking. 4 vCPU with 8 GB of RAM, with 4 guest
workers dirtying all 8 GB of RAM is a hard test. Even with autoconverge
such guests may not successfully complete in < 5 minutes.
- Measure the guest CPU performance eg time to write to 1 GB of RAM
While auto-converge can ensure completion, it has a really high and
prolonged impact on guest CPU performance, much worse than is
seen with post-copy. For example, time to write to 1 GB will degrade
from 400 ms/GB, to as much as 7000 ms/GB during post-copy, and this
hit may last many minutes. For post-copy, there will be small spikes
at the start of each iteration of migration ( 400ms/GB -> 1000ms/GB),
and a big spike at the switch over (400ms/GB -> 7000ms/GB), but the
duration of the spikes is very short (less than a second), so is a
clear winner over auto-converge where the guest CPU performance
hit lasts many minutes.
- Measure the overall CPU utilization of QEMU as a whole. This will
show impact of using compression, which is is to burn massive
amounts of CPU time in the QEMU migration thread
I've published by previous results here:
https://www.berrange.com/posts/2016/05/12/analysis-of-techniques-for-ensuring-migration-completion-with-kvm/
and the framework I used to collect all this data is distributed in
QEMU git repo now.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
More information about the OpenStack-dev
mailing list