[openstack-dev] [nova] Live migrations: Post-Copy and Auto-Converge features research

Daniel P. Berrange berrange at redhat.com
Tue Nov 8 10:37:29 UTC 2016


On Mon, Nov 07, 2016 at 10:34:39PM +0200, Vlad Nykytiuk wrote:
> Hi,
> 
> As you may know, currently QEMU supports several features that help live
> migrations to operate more predictively. These features are: auto-converge
> and post-copy. 
> I made a research on performance characteristics of these two features, you
> can find it by the following link:
> https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
> <https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/>

Thanks for the report, it looks to affirm the results that I've got
previously that show post-copy as the clear winner, and auto-converge
a viable alternative if post-copy is not available.

I've got a few suggestions if you want to do further investigation

 - Look at larger guests - a 1 vCPU guest with 2 GB of RAM is not
   particularly difficult to migrate when you have 10 Gig-E networking
   or even 1 Gig-E networking.  4 vCPU with 8 GB of RAM, with  4 guest
   workers dirtying all 8 GB of RAM is a hard test. Even with autoconverge
   such guests may not successfully complete in < 5 minutes.

 - Measure the guest CPU performance eg time to write to 1 GB of RAM
   While auto-converge can ensure completion, it has a really high and
   prolonged impact on guest CPU performance, much worse than is
   seen with post-copy.  For example, time to write to 1 GB will degrade
   from 400 ms/GB, to as much as 7000 ms/GB during post-copy, and this
   hit may last many minutes. For post-copy, there will be small spikes
   at the start of each iteration of migration ( 400ms/GB -> 1000ms/GB),
   and a big spike at the switch over (400ms/GB -> 7000ms/GB), but the
   duration of the spikes is very short (less than a second), so is a
   clear winner over auto-converge where the guest CPU performance
   hit lasts many minutes.

 - Measure the overall CPU utilization of QEMU as a whole. This will
   show impact of using compression, which is is to burn massive
   amounts of CPU time in the QEMU migration thread

I've published by previous results here:

  https://www.berrange.com/posts/2016/05/12/analysis-of-techniques-for-ensuring-migration-completion-with-kvm/

and the framework I used to collect all this data is distributed in
QEMU git repo now.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|



More information about the OpenStack-dev mailing list