[Openstack-operators] [openstack-dev] [nova][libvirt] RFC: ensuring live migration ends

David Medberry openstack at medberry.net
Sun Feb 1 22:03:45 UTC 2015


I'll second much of what Rob said:
API that indicated how many live-migrations (l-m) were going would be good.
API that told you what progress (and start time) a given l-m had made would
be great.
API to cancel a given l-m would also be great. I think this is a preferred
approach over an auto timeout (it would give us the tools we need to
implement an auto timeout though.)

I like the idea of trying auto-convergence (and agree it should be flavor
feature and likely not the default.) I suspect this one needs some testing.
It may be fine to automatically do this if it doesn't actually throttle the
VM some 90-99% of the time.  (Presumably this could also increase the max
downtime between cutover as well as throttling the vm.)

Thanks Daniel/Rob,
-dave

fyi: I'm an operator/devel on the Time Warner Cable openstack cloud.

On Sun, Feb 1, 2015 at 12:24 PM, Robert Collins <robertc at robertcollins.net>
wrote:

> On 31 January 2015 at 05:47, Daniel P. Berrange <berrange at redhat.com>
> wrote:
> > In working on a recent Nova migration bug
> >
> >   https://bugs.launchpad.net/nova/+bug/1414065
> >
> > I had cause to refactor the way the nova libvirt driver monitors live
> > migration completion/failure/progress. This refactor has opened the
> > door for doing more intelligent active management of the live migration
> > process.
> ...
> > What kind of things would be the biggest win from Operators' or tenants'
> > POV ?
>
> Awesome. Couple thoughts from my perspective. Firstly, there's a bunch
> of situation dependent tuning. One thing Crowbar does really nicely is
> that you specify the host layout in broad abstract terms - e.g. 'first
> 10G network link' and so on : some of your settings above like whether
> to compress page are going to be heavily dependent on the bandwidth
> available (I doubt that compression is a win on a 100G link for
> instance, and would be suspect at 10G even). So it would be nice if
> there was a single dial or two to set and Nova would auto-calculate
> good defaults from that (with appropriate overrides being available).
>
> Operationally avoiding trouble is better than being able to fix it, so
> I quite like the idea of defaulting the auto-converge option on, or
> perhaps making it controllable via flavours, so that operators can
> offer (and identify!) those particularly performance sensitive
> workloads rather than having to guess which instances are special and
> which aren't.
>
> Being able to cancel the migration would be good. Relatedly being able
> to restart nova-compute while a migration is going on would be good
> (or put differently, a migration happening shouldn't prevent a deploy
> of Nova code: interlocks like that make continuous deployment much
> harder).
>
> If we can't already, I'd like as a user to be able to see that the
> migration is happening (allows diagnosis of transient issues during
> the migration). Some ops folk may want to hide that of course.
>
> I'm not sure that automatically rolling back after N minutes makes
> sense : if the impact on the cluster is significant then 1 minute vs
> 10 doesn't instrinsically matter: what matters more is preventing too
> many concurrent migrations, so that would be another feature that I
> don't think we have yet: don't allow more than some N inbound and M
> outbound live migrations to a compute host at any time, to prevent IO
> storms. We may want to log with NOTIFICATION migrations that are still
> progressing but appear to be having trouble completing. And of course
> an admin API to query all migrations in progress to allow API driven
> health checks by monitoring tools - which gives the power to manage
> things to admins without us having to write a probably-too-simple
> config interface.
>
> HTH,
> Rob
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150201/219d69bd/attachment.html>


More information about the OpenStack-operators mailing list