[openstack-dev] [nova] Live migrations: Post-Copy and Auto-Converge features research

Vlad Nykytiuk vnykytiuk at mirantis.com
Tue Nov 8 18:58:41 UTC 2016


Daniel,

Thanks for your suggestions. 
Yes, there is a plan to do several more types of research in very similar areas.

Best
—
Vlad

> On Nov 8, 2016, at 12:37, Daniel P. Berrange <berrange at redhat.com> wrote:
> 
> On Mon, Nov 07, 2016 at 10:34:39PM +0200, Vlad Nykytiuk wrote:
>> Hi,
>> 
>> As you may know, currently QEMU supports several features that help live
>> migrations to operate more predictively. These features are: auto-converge
>> and post-copy. 
>> I made a research on performance characteristics of these two features, you
>> can find it by the following link:
>> https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
>> <https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/>
> 
> Thanks for the report, it looks to affirm the results that I've got
> previously that show post-copy as the clear winner, and auto-converge
> a viable alternative if post-copy is not available.
> 
> I've got a few suggestions if you want to do further investigation
> 
> - Look at larger guests - a 1 vCPU guest with 2 GB of RAM is not
>   particularly difficult to migrate when you have 10 Gig-E networking
>   or even 1 Gig-E networking.  4 vCPU with 8 GB of RAM, with  4 guest
>   workers dirtying all 8 GB of RAM is a hard test. Even with autoconverge
>   such guests may not successfully complete in < 5 minutes.
> 
> - Measure the guest CPU performance eg time to write to 1 GB of RAM
>   While auto-converge can ensure completion, it has a really high and
>   prolonged impact on guest CPU performance, much worse than is
>   seen with post-copy.  For example, time to write to 1 GB will degrade
>   from 400 ms/GB, to as much as 7000 ms/GB during post-copy, and this
>   hit may last many minutes. For post-copy, there will be small spikes
>   at the start of each iteration of migration ( 400ms/GB -> 1000ms/GB),
>   and a big spike at the switch over (400ms/GB -> 7000ms/GB), but the
>   duration of the spikes is very short (less than a second), so is a
>   clear winner over auto-converge where the guest CPU performance
>   hit lasts many minutes.
> 
> - Measure the overall CPU utilization of QEMU as a whole. This will
>   show impact of using compression, which is is to burn massive
>   amounts of CPU time in the QEMU migration thread
> 
> I've published by previous results here:
> 
>  https://www.berrange.com/posts/2016/05/12/analysis-of-techniques-for-ensuring-migration-completion-with-kvm/
> 
> and the framework I used to collect all this data is distributed in
> QEMU git repo now.
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list