[openstack-dev] [nova] live migration in Mitaka
Daniel P. Berrange
berrange at redhat.com
Fri Sep 18 15:23:46 UTC 2015
On Fri, Sep 18, 2015 at 11:53:05AM +0000, Murray, Paul (HP Cloud) wrote:
> Hi All,
>
> There are various efforts going on around live migration at the moment:
> fixing up CI, bug fixes, additions to cover more corner cases, proposals
> for new operations....
>
> Generally live migration could do with a little TLC (see: [1]), so I am
> going to suggest we give some of that care in the next cycle.
>
> Please respond to this post if you have an interest in this and what you
> would like to see done. Include anything you are already getting on with
> so we get a clear picture. If there is enough interest I'll put this
> together as a proposal for a work stream. Something along the lines of
> "robustify live migration".
We merged some robustness improvements for migration during Liberty.
Specifically, with KVM we now track the progress of data transfer
and if it is not making forward progress during a set window of
time, we will abort the migration. This ensures you don't get a
migration that never ends. We also now have code which dynamically
increases the max permitted downtime during switchover, to try and
make it more likely to succeeed. We could do with getting feedback
on how well the various tunable settings work in practie for real
world deployments, to see if we need to change any defaults.
There was a proposal to nova to allow the 'pause' operation to be
invoked while migration was happening. This would turn a live
migration into a coma-migration, thereby ensuring it succeeds.
I cna't remember if this merged or not, as i can't find the review
offhand, but its important to have this ASAP IMHO, as when
evacuating VMs from a host admins need a knob to use to force
successful evacuation, even at the cost of pausing the guest
temporarily.
In libvirt upstream we now have the ability to filter what disks are
migrated during block migration. We need to leverage that new feature
to fix the long standing problems of block migration when non-local
images are attached - eg cinder volumes. We definitely want this
in Mitaka.
We should look at what we need to do to isolate the migration data
network from the main management network. Currently we live
migrate over whatever network is associated with the compute hosts
primary Hostname / IP address. This is not neccessarily the fastest
NIC on the host. We ought to be able to record an alternative
hostname / IP address against each compute host to indicate the
desired migration interface.
Libvirt/KVM have the ability to turn on compression for migration
which again improves the chances of convergance & thus success.
We would look at leveraging that.
QEMU has a crude "auto-converge" flag you can turn on, which limits
guest CPU execution time, in an attempt to slow down data dirtying
rate to again improve chance of successful convergance.
I'm working on enhancements to QEMU itself to support TLS encryption
for migration. This will enable openstack to have secure migration
datastream, without having to tunnel via libvirtd. This is useful
as tunneling via libvirtd doesn't work with block migration. It will
also be much faster than tunnelling. This probably might be merged
in QEMU before Mitaka cycle ends, but more likely it is Nxxx cycle
There is also work on post-copy migration in QEMU. Normally with
live migration, the guest doesn't start executing on the target
host until migration has transferred all data. There are many
workloads where that doesn't work, as the guest is dirtying data
too quickly, With post-copy you can start runing the guest on the
target at any time, and when it faults on a missing page that will
be pulled from the source host. This is slightly more fragile as
you risk loosing the guest entirely if the source host dies before
migration finally completes. It does guarantee that migration will
succeed no matter what workload is in the guest. This is probably
Nxxxx cycle material.
Testing. Testing. Testing.
Lots more I can't think of right now....
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the OpenStack-dev
mailing list