[openstack-dev] [nova] live migration in Mitaka
Chris Friesen
chris.friesen at windriver.com
Tue Sep 22 15:29:46 UTC 2015
Apologies for the indirect quote, some of the earlier posts got deleted before I
noticed the thread.
On 09/21/2015 03:43 AM, Koniszewski, Pawel wrote:
>> -----Original Message-----
>> From: Daniel P. Berrange [mailto:berrange at redhat.com]
>> There was a proposal to nova to allow the 'pause' operation to be invoked
>> while migration was happening. This would turn a live migration into a
>> coma-migration, thereby ensuring it succeeds. I cna't remember if this
>> merged or not, as i can't find the review offhand, but its important to
>> have this ASAP IMHO, as when evacuating VMs from a host admins need a knob
>> to use to force successful evacuation, even at the cost of pausing the
>> guest temporarily.
It's not strictly "live" migration, but for the same reason of pushing VMs off a
host for maintenance it would be nice to have some way of migrating suspended
instances. (As brought up in
http://lists.openstack.org/pipermail/openstack-dev/2015-September/075042.html)
>> In libvirt upstream we now have the ability to filter what disks are
>> migrated during block migration. We need to leverage that new feature to
>> fix the long standing problems of block migration when non-local images are
>> attached - eg cinder volumes. We definitely want this in Mitaka.
Agreed, this would be a very useful addition.
>> We should look at what we need to do to isolate the migration data network
>> from the main management network. Currently we live migrate over whatever
>> network is associated with the compute hosts primary Hostname / IP address.
>> This is not neccessarily the fastest NIC on the host. We ought to be able
>> to record an alternative hostname / IP address against each compute host to
>> indicate the desired migration interface.
Yes, this would be good to have upstream. We've added this sort of thing
locally (though with a hardcoded naming scheme) to allow migration over 10G
links with management over 1G links.
>> There is also work on post-copy migration in QEMU. Normally with live
>> migration, the guest doesn't start executing on the target host until
>> migration has transferred all data. There are many workloads where that
>> doesn't work, as the guest is dirtying data too quickly, With post-copy you
>> can start running the guest on the target at any time, and when it faults
>> on a missing page that will be pulled from the source host. This is
>> slightly more fragile as you risk loosing the guest entirely if the source
>> host dies before migration finally completes. It does guarantee that
>> migration will succeed no matter what workload is in the guest. This is
>> probably Nxxxx cycle material.
It seems to me that the ideal solution would be to start doing pre-copy
migration, then if that doesn't converge with the specified downtime value then
maybe have the option to just cut over to the destination and do a post-copy
migration of the remaining data.
Chris
More information about the OpenStack-dev
mailing list