[openstack-dev] [nova] Migration progress

Paul Carlton paul.carlton2 at hpe.com
Mon Nov 23 11:41:58 UTC 2015



On 23/11/15 11:02, John Garbutt wrote:
> On 23 November 2015 at 08:36, Paul Carlton <paul.carlton2 at hpe.com> wrote:
>> John
>>
>> At the live migration sub team meeting I undertook to look at the issue
>> of progress reporting.
>>
>> The use cases I'm envisaging are...
>>
>> As a user I want to know how much longer my instance will be migrating
>> for.
>>
>> As an operator I want to identify any migration that are making slow
>>   progress so I can expedite their progress or abort them.
> +1
>
> Agreed with this need.
>
> Proposals to add pause and cancel clearly make this need more acute.
>
>> The current implementation reports on the instance's migration with
>> respect to memory transfer, using the total memory and memory remaining
>> fields from libvirt to report the percentage of memory still to be
>> transferred.  Due to the instance writing to pages already transferred
>> this percentage can go up as well as down.  Daniel has done a good job
>> of generating regular log records to report progress and highlight lack
>> of progress but from the API all a user/operator can see is the current
>> percentage complete.  By observing this periodically they can identify
>> instance migrations that are struggling to migrate memory pages fast
>> enough to keep pace with the instance's memory updates.
>>
>> The problem is that at present we have only one field, the instance
>> progress, to record progress.  With a live migration there are measures
>> of progress, how much of the ephemeral disks (not needed for shared
>> disk setups) have been copied and how much of the memory has been
>> copied. Both can go up and down as the instance writes to pages already
>> copied causing those pages to need to be copied again.  As Daniel says
>> in his comments in the code, the disk size could dwarf the memory so
>> reporting both in single percentage number is problematic.
>>
>> We could add an additional progress item to the instance object, i.e.
>> disk progress and memory progress but that seems odd to have an
>> additional progress field only for this operation so this is probably
>> a non starter!
>>
>> For operations staff with access to log files we could report disk
>> progress as well as memory in the log file, however that does not
>> address the needs of users and whilst log files are the right place for
>> support staff to look when investigating issues operational tooling
>> is much better served by notification messages.
>>
>> Thus I'd recommend generating periodic notifications during a migration
>> to report both memory and disk progress would be useful?  Cloud
>> operators are likely to manage their instance migration activity using
>> some orchestration tooling which could consume these notifications and
>> deduce what challenges the instance migration is encountering and thus
>> determine how to address any issues.
> To be clear, our notifications are not designed to be consumed by end users.
Yep, I see this as something cloud operations tooling could consume.
It does not address end user's needs.
>
>> The use cases are only partially addressed by the current
>> implementation, they can repeatedly get the server details and look at
>> the progress percentage to see how quickly (or even if) it is
>> increasing and determine how long the instance is likely to be
>> migrating for.  However for an instance that has a large disk and/or
>> is doing a high rate of disk i/o they may see the percentage complete
>> (i.e. memory) repeatedly showing 90%+ but the instance migration does
>> not complete.
> Agreed reporting progress, particularly with live-migrate, is awful right now.
>
> Long term, I have my eye on this work:
> https://etherpad.openstack.org/p/liberty-cross-project-user-notifications
>
> But we should work on getting a good conceptual model for the progress
> that can be exposed using the above system.
>
>> The nova spec https://review.openstack.org/#/c/248472/ suggests making
>> detailed information available via the os-migrations object.  This is
>> not a bad idea but I have some issues with the implementation that I
>> will share on that spec.
> We do also need something that works across all hypervisor types.
>
> Lets talk more on that spec review.
>
> Thanks,
> johnthetubaguy

-- 
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ

Mobile:    +44 (0)7768 994283
Email:    mailto:paul.carlton2 at hpe.com
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL".


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4722 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151123/2173b527/attachment.bin>


More information about the OpenStack-dev mailing list