[openstack-dev] [nova] live migration in Mitaka

Paul Carlton paul.carlton2 at hpe.com
Wed Sep 23 13:27:11 UTC 2015



On 23/09/15 14:11, Daniel P. Berrange wrote:
> On Wed, Sep 23, 2015 at 01:48:17PM +0100, Paul Carlton wrote:
>>
>> On 22/09/15 16:44, Daniel P. Berrange wrote:
>>> On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote:
>>>>>> There is also work on post-copy migration in QEMU. Normally with live
>>>>>> migration, the guest doesn't start executing on the target host until
>>>>>> migration has transferred all data. There are many workloads where that
>>>>>> doesn't work, as the guest is dirtying data too quickly, With post-copy you
>>>>>> can start running the guest on the target at any time, and when it faults
>>>>>> on a missing page that will be pulled from the source host. This is
>>>>>> slightly more fragile as you risk loosing the guest entirely if the source
>>>>>> host dies before migration finally completes. It does guarantee that
>>>>>> migration will succeed no matter what workload is in the guest. This is
>>>>>> probably Nxxxx cycle material.
>>>> It seems to me that the ideal solution would be to start doing pre-copy
>>>> migration, then if that doesn't converge with the specified downtime value
>>>> then maybe have the option to just cut over to the destination and do a
>>>> post-copy migration of the remaining data.
>>> Yes, that is precisely what the QEMU developers working on this
>>> featue suggest we should do. The lazy page faulting on the target
>>> host has a performance hit on the guest, so you definitely need
>>> to give a little time for pre-copy to start off with, and then
>>> switch to post-copy once some benchmark is reached, or if progress
>>> info shows the transfer is not making progress.
>>>
>>> Regards,
>>> Daniel
>> I'd be a bit concerned about automatically switching to the post copy
>> mode.  As Daniel commented perviously, if something goes wrong on the
>> source node the customer's instance could be lost.  Many cloud operators
>> will want to control the use of this mode.  As per my previous message
>> this could be something that could be set on or off by default but
>> provide a PUT operation on os-migration to update setting on for a
>> specific migration
> NB, if you are concerned about the source host going down while
> migration is still taking place, you will loose the VM even with
> pre-copy mode too, since the VM will of course still be running
> on the source.
>
> The new failure scenario is essentially about the network
> connection between the source & host guest - if the network
> layer fails while post-copy is running, then you loose the
> VM.
>
> In some sense post-copy will reduce the window of failure,
> because it should ensure that the VM migration completes
> in a faster & finite amount of time. I think this is
> probably particularly important for host evacuation so
> the admin can guarantee to get all the VMs off a host in
> a reasonable amount of time.
>
> As such I don't think you need expose post-copy as a concept in the
> API, but I could see a nova.conf value to say whether use of post-copy
> was acceptable, so those who want to have stronger resilience against
> network failure can turn off post-copy.
>
> Regards,
> Daniel

If the source node fails during a pre-copy migration then when that node
is restored the instance is ok again (usually).  With the post-copy
approach the risk is that the instance will be corrupted which many
cloud operators would consider to be an unacceptable risk.

However, let's start by exposing it as a nova.conf setting and see how
that goes.

-- 
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:    +44 (0)7768 994283
Email:    mailto:paul.carlton2 at hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL".


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4722 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150923/670383c8/attachment.bin>


More information about the OpenStack-dev mailing list