[openstack-dev] [nova] live migration in Mitaka

Koniszewski, Pawel pawel.koniszewski at intel.com
Mon Sep 21 09:43:58 UTC 2015


> -----Original Message-----
> From: Daniel P. Berrange [mailto:berrange at redhat.com]
> Sent: Friday, September 18, 2015 5:24 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] live migration in Mitaka
>
> On Fri, Sep 18, 2015 at 11:53:05AM +0000, Murray, Paul (HP Cloud) wrote:
> > Hi All,
> >
> > There are various efforts going on around live migration at the moment:
> > fixing up CI, bug fixes, additions to cover more corner cases,
> > proposals for new operations....
> >
> > Generally live migration could do with a little TLC (see: [1]), so I
> > am going to suggest we give some of that care in the next cycle.
> >
> > Please respond to this post if you have an interest in this and what
> > you would like to see done. Include anything you are already getting
> > on with so we get a clear picture. If there is enough interest I'll
> > put this together as a proposal for a work stream. Something along the
> > lines of "robustify live migration".
>
> We merged some robustness improvements for migration during Liberty.
> Specifically, with KVM we now track the progress of data transfer and if
it
> is
> not making forward progress during a set window of time, we will abort the
> migration. This ensures you don't get a migration that never ends. We also
> now have code which dynamically increases the max permitted downtime
> during switchover, to try and make it more likely to succeeed. We could do
> with getting feedback on how well the various tunable settings work in
> practie for real world deployments, to see if we need to change any
> defaults.
>
> There was a proposal to nova to allow the 'pause' operation to be invoked
> while migration was happening. This would turn a live migration into a
coma-
> migration, thereby ensuring it succeeds.
> I cna't remember if this merged or not, as i can't find the review
offhand,
> but
> its important to have this ASAP IMHO, as when evacuating VMs from a host
> admins need a knob to use to force successful evacuation, even at the cost
> of pausing the guest temporarily.

There are two different proposals - cancel on-going live migration and pause
VM during live migration. Both are very important. Right now there is no way

to interact with on-going live migration through Nova.

Specification for 'cancel on-going live migration' is up for review [1].
'Pause VM during live migration' (it might be something like
force-live-migration) depends on this change so I'm waiting with
specification
until 'cancel' spec is merged. I'll try to prepare it before summit so both
specs can be discussed in Tokyo.

> In libvirt upstream we now have the ability to filter what disks are
> migrated
> during block migration. We need to leverage that new feature to fix the
long
> standing problems of block migration when non-local images are attached -
> eg cinder volumes. We definitely want this in Mitaka.
>
> We should look at what we need to do to isolate the migration data network
> from the main management network. Currently we live migrate over
> whatever network is associated with the compute hosts primary Hostname /
> IP address. This is not neccessarily the fastest NIC on the host. We ought
> to
> be able to record an alternative hostname / IP address against each
compute
> host to indicate the desired migration interface.
>
> Libvirt/KVM have the ability to turn on compression for migration which
> again
> improves the chances of convergance & thus success.
> We would look at leveraging that.

It is merged in QEMU (version 2.4), however, it isn't merged in Libvirt
yet[2]
(1-9 patches from ShaoHe Feng). The simplest solution shouldn't require any
work in Nova, it's just another live migration flag. To extend this we will
probably need to add API call to nova, e.g, to change compression
ratio or to change number of compression threads.

However, this work is for O cycle (or even later) IMHO. The latest used QEMU
is 2.3 (in Ubuntu 15.10). Adoption of QEMU 2.4 and Libvirt with compression
will take some time, so we don't need to focus on it right now.

> QEMU has a crude "auto-converge" flag you can turn on, which limits guest
> CPU execution time, in an attempt to slow down data dirtying rate to again
> improve chance of successful convergance.
>
> I'm working on enhancements to QEMU itself to support TLS encryption for
> migration. This will enable openstack to have secure migration datastream,
> without having to tunnel via libvirtd. This is useful as tunneling via
> libvirtd
> doesn't work with block migration. It will also be much faster than
> tunnelling.
> This probably might be merged in QEMU before Mitaka cycle ends, but more
> likely it is Nxxx cycle

+++ Looking forward to see it!

> There is also work on post-copy migration in QEMU. Normally with live
> migration, the guest doesn't start executing on the target host until
> migration
> has transferred all data. There are many workloads where that doesn't
work,
> as the guest is dirtying data too quickly, With post-copy you can start
> runing
> the guest on the target at any time, and when it faults on a missing page
> that
> will be pulled from the source host. This is slightly more fragile as you
> risk
> loosing the guest entirely if the source host dies before migration
finally
> completes. It does guarantee that migration will succeed no matter what
> workload is in the guest. This is probably Nxxxx cycle material.
>
> Testing. Testing. Testing.

+++ We need functional tests for LM.

> Lots more I can't think of right now....
>

One more thing - there is a lot of effort around OpenStack upgradeability.
However, if nova-compute upgrade happens while there is live migration in
progress, it will leave things in a very messy state. We should consider,
e.g., soft restart that will wait for current live migration (or probably
any
other long running action) to finish. Long-term solution would be to
implement
some kind of a live migration recovery/cleanup mechanisms in nova.

[1] https://review.openstack.org/#/c/179149/
[2] https://libvirt.org/pending.html

Kind Regards,
Pawel Koniszewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6499 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150921/2834abc2/attachment.bin>


More information about the OpenStack-dev mailing list