Open Stack

Tue Nov 24 10:32:21 UTC 2015

Hi Paul,
Comments inline:

2015-11-23 16:36 GMT+08:00 Paul Carlton <paul.carlton2 at hpe.com>:

> John
>
> At the live migration sub team meeting I undertook to look at the issue
> of progress reporting.
>
> The use cases I'm envisaging are...
>
> As a user I want to know how much longer my instance will be migrating
> for.
>
> As an operator I want to identify any migration that are making slow
>  progress so I can expedite their progress or abort them.
>
> The current implementation reports on the instance's migration with
> respect to memory transfer, using the total memory and memory remaining
> fields from libvirt to report the percentage of memory still to be
> transferred.  Due to the instance writing to pages already transferred
> this percentage can go up as well as down.  Daniel has done a good job
> of generating regular log records to report progress and highlight lack
> of progress but from the API all a user/operator can see is the current
> percentage complete.  By observing this periodically they can identify
> instance migrations that are struggling to migrate memory pages fast
> enough to keep pace with the instance's memory updates.
>
> The problem is that at present we have only one field, the instance
> progress, to record progress.  With a live migration there are measures
>

[Shaohe]:

>From this link, OpenStack API ref:
http://developer.openstack.org/api-ref-compute-v2.1.html#listDetailServers
It describe the instance progress: A percentage value of the build progress.
But for libvirt driver it does be migration progress.
For other driver it is building progress.
And there is a spec to propose some change.
https://review.openstack.org/#/c/249086/

> of progress, how much of the ephemeral disks (not needed for shared
> disk setups) have been copied and how much of the memory has been
> copied. Both can go up and down as the instance writes to pages already
> copied causing those pages to need to be copied again.  As Daniel says
> in his comments in the code, the disk size could dwarf the memory so
> reporting both in single percentage number is problematic.
>
> We could add an additional progress item to the instance object, i.e.
> disk progress and memory progress but that seems odd to have an
> additional progress field only for this operation so this is probably
> a non starter!
>
> For operations staff with access to log files we could report disk
> progress as well as memory in the log file, however that does not
> address the needs of users and whilst log files are the right place for
> support staff to look when investigating issues operational tooling
> is much better served by notification messages.
>
> Thus I'd recommend generating periodic notifications during a migration
> to report both memory and disk progress would be useful?  Cloud
> operators are likely to manage their instance migration activity using
> some orchestration tooling which could consume these notifications and
> deduce what challenges the instance migration is encountering and thus
> determine how to address any issues.
>
> The use cases are only partially addressed by the current
> implementation, they can repeatedly get the server details and look at
> the progress percentage to see how quickly (or even if) it is
> increasing and determine how long the instance is likely to be
> migrating for.  However for an instance that has a large disk and/or
> is doing a high rate of disk i/o they may see the percentage complete
> (i.e. memory) repeatedly showing 90%+ but the instance migration does
> not complete.
>
> The nova spec https://review.openstack.org/#/c/248472/ suggests making
> detailed information available via the os-migrations object.  This is
> not a bad idea but I have some issues with the implementation that I
> will share on that spec.
>

[Shaohe]:

About this spec, Daniel has give some comments on it, and we have updated it.
Maybe we can work together on it to make it more better.

I have worked on libvirt multi-thread compress migration for libvirt. and looks
into some live migrations performance optimizations.

and generate an  ideas:
1. Let nova expose more live migration
details, such as the RAM statistics, xbzrle-cache status, also the information
of multi-thread compression in future, and so on.
2. nova can enable auto-converge, tune
the xbzrle-cache and multi-thread compression dynamically.
3. Then other project can make a good
strategy to tune the live migration base on the migration details.

For example:
cache size is a performance key for xbzrle,  the best is that the cache size are
same with the guest total RAM, but this maybe not always available on host.
Multi-thread compress level is higher is better, but it is cpu consume,
Auto converge will slow down the CPU running.
Seems things not always as good as I had expected.

Also we have submit a topic to summit about this idea, but not accepted.
Topic: <Towards Robust Live Migration in Dynamic Environments>
Link: https://www.openstack.org/summit/tokyo-2015/vote-for-speakers/presentation/4971

We looking into other hypervisor, it does not expose so many details.
And Daniel are right.
we should not expose so low level QEMU specific implementation details.

>
> -- Paul Carlton Software Engineer Cloud Services
> Hewlett Packard Enterprise
> BUK03:T242
> Longdown Avenue
> Stoke Gifford
> Bristol BS34 8QZ
> Mobile: +44 (0)7768 994283
> Email: mailto:paul.carlton2 at hpe.com
> Hewlett-Packard Enterprise Limited
> registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No:
> 690597 England.
> The contents of this message and any attachments to it are confidential
> and may be legally privileged.
> If you have received this message in error, you should delete it from your
> system immediately and advise the sender.
> To any recipient of this message within HP, unless otherwise stated you
> should consider this message and attachments as "HP CONFIDENTIAL".
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151124/d1bad570/attachment.html>

Open Stack

[openstack-dev] [nova] Migration progress

OpenStack

Community

Documentation

Branding & Legal