[Openstack-operators] [openstack-dev] [nova] Live migrations: Post-Copy and Auto-Converge features research
Vlad Nykytiuk
vnykytiuk at mirantis.com
Tue Nov 8 18:58:41 UTC 2016
Daniel,
Thanks for your suggestions.
Yes, there is a plan to do several more types of research in very similar areas.
Best
—
Vlad
> On Nov 8, 2016, at 12:37, Daniel P. Berrange <berrange at redhat.com> wrote:
>
> On Mon, Nov 07, 2016 at 10:34:39PM +0200, Vlad Nykytiuk wrote:
>> Hi,
>>
>> As you may know, currently QEMU supports several features that help live
>> migrations to operate more predictively. These features are: auto-converge
>> and post-copy.
>> I made a research on performance characteristics of these two features, you
>> can find it by the following link:
>> https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
>> <https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/>
>
> Thanks for the report, it looks to affirm the results that I've got
> previously that show post-copy as the clear winner, and auto-converge
> a viable alternative if post-copy is not available.
>
> I've got a few suggestions if you want to do further investigation
>
> - Look at larger guests - a 1 vCPU guest with 2 GB of RAM is not
> particularly difficult to migrate when you have 10 Gig-E networking
> or even 1 Gig-E networking. 4 vCPU with 8 GB of RAM, with 4 guest
> workers dirtying all 8 GB of RAM is a hard test. Even with autoconverge
> such guests may not successfully complete in < 5 minutes.
>
> - Measure the guest CPU performance eg time to write to 1 GB of RAM
> While auto-converge can ensure completion, it has a really high and
> prolonged impact on guest CPU performance, much worse than is
> seen with post-copy. For example, time to write to 1 GB will degrade
> from 400 ms/GB, to as much as 7000 ms/GB during post-copy, and this
> hit may last many minutes. For post-copy, there will be small spikes
> at the start of each iteration of migration ( 400ms/GB -> 1000ms/GB),
> and a big spike at the switch over (400ms/GB -> 7000ms/GB), but the
> duration of the spikes is very short (less than a second), so is a
> clear winner over auto-converge where the guest CPU performance
> hit lasts many minutes.
>
> - Measure the overall CPU utilization of QEMU as a whole. This will
> show impact of using compression, which is is to burn massive
> amounts of CPU time in the QEMU migration thread
>
> I've published by previous results here:
>
> https://www.berrange.com/posts/2016/05/12/analysis-of-techniques-for-ensuring-migration-completion-with-kvm/
>
> and the framework I used to collect all this data is distributed in
> QEMU git repo now.
>
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-operators
mailing list