[openstack-dev] [nova][libvirt] RFC: ensuring live migration ends

Vladik Romanovsky vladik.romanovsky at enovance.com
Mon Feb 2 14:45:37 UTC 2015



----- Original Message -----
> From: "Daniel P. Berrange" <berrange at redhat.com>
> To: "Robert Collins" <robertc at robertcollins.net>
> Cc: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>,
> openstack-operators at lists.openstack.org
> Sent: Monday, 2 February, 2015 5:56:56 AM
> Subject: Re: [openstack-dev] [nova][libvirt] RFC: ensuring live migration	ends
> 
> On Mon, Feb 02, 2015 at 08:24:20AM +1300, Robert Collins wrote:
> > On 31 January 2015 at 05:47, Daniel P. Berrange <berrange at redhat.com>
> > wrote:
> > > In working on a recent Nova migration bug
> > >
> > >   https://bugs.launchpad.net/nova/+bug/1414065
> > >
> > > I had cause to refactor the way the nova libvirt driver monitors live
> > > migration completion/failure/progress. This refactor has opened the
> > > door for doing more intelligent active management of the live migration
> > > process.
> > ...
> > > What kind of things would be the biggest win from Operators' or tenants'
> > > POV ?
> > 
> > Awesome. Couple thoughts from my perspective. Firstly, there's a bunch
> > of situation dependent tuning. One thing Crowbar does really nicely is
> > that you specify the host layout in broad abstract terms - e.g. 'first
> > 10G network link' and so on : some of your settings above like whether
> > to compress page are going to be heavily dependent on the bandwidth
> > available (I doubt that compression is a win on a 100G link for
> > instance, and would be suspect at 10G even). So it would be nice if
> > there was a single dial or two to set and Nova would auto-calculate
> > good defaults from that (with appropriate overrides being available).
> 
> I wonder how such an idea would fit into Nova, since it doesn't really
> have that kind of knowledge about the network deployment characteristics.
> 
> > Operationally avoiding trouble is better than being able to fix it, so
> > I quite like the idea of defaulting the auto-converge option on, or
> > perhaps making it controllable via flavours, so that operators can
> > offer (and identify!) those particularly performance sensitive
> > workloads rather than having to guess which instances are special and
> > which aren't.
> 
> I'll investigate the auto-converge further to find out what the
> potential downsides of it are. If we can unconditionally enable
> it, it would be simpler than adding yet more tunables.
> 
> > Being able to cancel the migration would be good. Relatedly being able
> > to restart nova-compute while a migration is going on would be good
> > (or put differently, a migration happening shouldn't prevent a deploy
> > of Nova code: interlocks like that make continuous deployment much
> > harder).
> > 
> > If we can't already, I'd like as a user to be able to see that the
> > migration is happening (allows diagnosis of transient issues during
> > the migration). Some ops folk may want to hide that of course.
> > 
> > I'm not sure that automatically rolling back after N minutes makes
> > sense : if the impact on the cluster is significant then 1 minute vs
> > 10 doesn't instrinsically matter: what matters more is preventing too
> > many concurrent migrations, so that would be another feature that I
> > don't think we have yet: don't allow more than some N inbound and M
> > outbound live migrations to a compute host at any time, to prevent IO
> > storms. We may want to log with NOTIFICATION migrations that are still
> > progressing but appear to be having trouble completing. And of course
> > an admin API to query all migrations in progress to allow API driven
> > health checks by monitoring tools - which gives the power to manage
> > things to admins without us having to write a probably-too-simple
> > config interface.
> 
> Interesting, the point about concurrent migrations hadn't occurred to
> me before, but it does of course make sense since migration is
> primarily network bandwidth limited, though disk bandwidth is relevant
> too if doing block migration.

Indeed, there was a lot time spent investigating this topic (in Ovirt again)
and eventually it was decided to expose a config option and allow 3 concurrent
migrations by default.

https://github.com/oVirt/vdsm/blob/master/lib/vdsm/config.py.in#L126

> 
> Regards,
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list