[openstack-dev] [nova] live migration in Mitaka

Daniel P. Berrange berrange at redhat.com
Wed Sep 23 13:11:28 UTC 2015


On Wed, Sep 23, 2015 at 01:48:17PM +0100, Paul Carlton wrote:
> 
> 
> On 22/09/15 16:44, Daniel P. Berrange wrote:
> >On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:
> >>>>There is also work on post-copy migration in QEMU. Normally with live
> >>>>migration, the guest doesn't start executing on the target host until
> >>>>migration has transferred all data. There are many workloads where that
> >>>>doesn't work, as the guest is dirtying data too quickly, With post-copy you
> >>>>can start running the guest on the target at any time, and when it faults
> >>>>on a missing page that will be pulled from the source host. This is
> >>>>slightly more fragile as you risk loosing the guest entirely if the source
> >>>>host dies before migration finally completes. It does guarantee that
> >>>>migration will succeed no matter what workload is in the guest. This is
> >>>>probably Nxxxx cycle material.
> >>It seems to me that the ideal solution would be to start doing pre-copy
> >>migration, then if that doesn't converge with the specified downtime value
> >>then maybe have the option to just cut over to the destination and do a
> >>post-copy migration of the remaining data.
> >Yes, that is precisely what the QEMU developers working on this
> >featue suggest we should do. The lazy page faulting on the target
> >host has a performance hit on the guest, so you definitely need
> >to give a little time for pre-copy to start off with, and then
> >switch to post-copy once some benchmark is reached, or if progress
> >info shows the transfer is not making progress.
> >
> >Regards,
> >Daniel
> I'd be a bit concerned about automatically switching to the post copy
> mode.  As Daniel commented perviously, if something goes wrong on the
> source node the customer's instance could be lost.  Many cloud operators
> will want to control the use of this mode.  As per my previous message
> this could be something that could be set on or off by default but
> provide a PUT operation on os-migration to update setting on for a
> specific migration

NB, if you are concerned about the source host going down while
migration is still taking place, you will loose the VM even with
pre-copy mode too, since the VM will of course still be running
on the source.

The new failure scenario is essentially about the network
connection between the source & host guest - if the network
layer fails while post-copy is running, then you loose the
VM.

In some sense post-copy will reduce the window of failure,
because it should ensure that the VM migration completes
in a faster & finite amount of time. I think this is
probably particularly important for host evacuation so
the admin can guarantee to get all the VMs off a host in
a reasonable amount of time.

As such I don't think you need expose post-copy as a concept in the
API, but I could see a nova.conf value to say whether use of post-copy
was acceptable, so those who want to have stronger resilience against
network failure can turn off post-copy.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list