[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

Daniel P. Berrange berrange at redhat.com
Fri Jan 8 14:11:56 UTC 2016


On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui wrote:
> > On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin wrote:
> > > Hi
> > > 
> > > commit 8ecf93e[1] got me thinking - the live_migration_flag config
> > > option unnecessarily allows operators choose arbitrary behavior of the
> > > migrateToURI() libvirt call, to the extent that we allow the operator
> > > to configure a behavior that can result in data loss[1].
> > > 
> > > I see that danpb recently said something similar:
> > > 
> > >   https://review.openstack.org/171098
> > > 
> > >   "Honestly, I wish we'd just kill off  'live_migration_flag' and
> > >   'block_migration_flag' as config options. We really should not be
> > >   exposing low level libvirt API flags as admin tunable settings.
> > > 
> > >   Nova should really be in charge of picking the correct set of flags
> > >   for the current libvirt version, and the operation it needs to
> > >   perform. We might need to add other more sensible config options in
> > >   their place [..]"
> > 
> > Nova should really handle internal flags and this serie is running in
> > the right way.
> > 
> > > ...
> > 
> > >   4) Add a new config option for tunneled versus native:
> > > 
> > >        [libvirt]
> > >        live_migration_tunneled = true
> > > 
> > >      This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have 
> > >      historically defaulted to tunneled mode because it requires the 
> > >      least configuration and is currently the only way to have a 
> > >      secure migration channel.
> > > 
> > >      danpb's quote above continues with:
> > > 
> > >        "perhaps a "live_migration_secure_channel" to indicate that 
> > >         migration must use encryption, which would imply use of 
> > >         TUNNELLED flag"
> > > 
> > >      So we need to discuss whether the config option should express the
> > >      choice of tunneled vs native, or whether it should express another
> > >      choice which implies tunneled vs native.
> > > 
> > >        https://review.openstack.org/263434
> > 
> > We probably have to consider that operator does not know much about
> > internal libvirt flags, so options we are exposing for him should
> > reflect benefice of using them. I commented on your review we should
> > at least explain benefice of using this option whatever the name is.
> 
> As predicted, plenty of discussion on this point in the review :)
> 
> You're right that we don't give the operator any guidance in the help
> message about how to choose true or false for this:
> 
>   Whether to use tunneled migration, where migration data is 
>   transported over the libvirtd connection. If True,
>   we use the VIR_MIGRATE_TUNNELLED migration flag
> 
> libvirt's own docs on this are here:
> 
>   https://libvirt.org/migration.html#transport
> 
> which emphasizes:
> 
>   - the data copies involved in tunneling
>   - the extra configuration steps required for native
>   - the encryption support you get when tunneling
> 
> The discussions I've seen on this topic wrt Nova have revolved around:
> 
>   - that tunneling allows for an encrypted transport[1]
>   - that qemu's NBD based drive-mirror block migration isn't supported
>     using tunneled mode, and that danpb is working on fixing this
>     limitation in libvirt
>   - "selective" block migration[2] won't work with the fallback qemu
>     block migration support, and so won't currently work in tunneled
>     mode

I'm not working on fixing it, but IIRC some other dev had proposed
patches.

> 
> So, the advise to operators would be:
> 
>   - You may want to choose tunneled=False for improved block migration 
>     capabilities, but this limitation will go away in future.
>   - You may want to choose tunneled=False if you wish to trade and
>     encrypted transport for a (potentially negligible) performance
>     improvement.
> 
> Does that make sense?
> 
> As for how to name the option, and as I said in the review, I think it
> makes sense to be straightforward here and make it clearly about
> choosing to disable libvirt's tunneled transport.
> 
> If we name it any other way, I think our explanation for operators will
> immediately jump to explaining (a) that it influences the TUNNELLED
> flag, and (b) the differences between the tunneled and native
> transports. So, if we're going to have to talk about tunneled versus
> native, why obscure that detail?

Ultimately we need to recognise that libvirt's tunnelled mode was
added as a hack, to work around fact that QEMU lacked any kind of
native security capabilities & didn't appear likely to ever get
them at that time.  As well as not working with modern NBD based
block device encryption, it really sucks for performance because
it introduces many extra data copies. So it is going to be quite
poor for large VMs with heavy rate of data dirtying.

The only long term relative "benefit" of tunnelled mode is that
it avoids the need to open extra firewall ports.

IMHO, the long term future is to *never* use tunnelled mode for
QEMU. This will be viable when my support for native TLS support
in QEMU migration + NBD protocols is merged. I'm hopeful this
wil be for QEMU 2.6

> But, Pawel strongly disagrees.
> 
> One last point I'd make is this isn't about adding a *new*
> configuration capability for operators. As we deprecate and remove
> these configuration options, we need to be careful not to remove a
> capability that operators are currently depending on for arguably
> reasonable reasons.

My view is that "live_migration_tunneled" is a reasonable parameter
to add, because there is a genuine need to let admins choose this
behaviour. We should make sure it is correctly done as a tri-state
flag though, when it is 'None', Nova should pick what it things is
the best approach based on QEMU version. Probably to use QEMU
native when it supports TLS, otherwise use tunnelled if possible
to get security.

> 
> [1] - https://review.openstack.org/#/c/171098/
> [2] - https://review.openstack.org/#/c/227278
> 
> 
> > >   5) Add a new config option for additional migration flags:
> > > 
> > >        [libvirt]
> > >        live_migration_extra_flags = VIR_MIGRATE_COMPRESSED
> > > 
> > >      This allows operators to continue to experiment with libvirt behaviors
> > >      in safe ways without each use case having to be accounted for.
> > > 
> > >        https://review.openstack.org/263435
> > > 
> > >      We would disallow setting the following flags via this option:
> > > 
> > >        VIR_MIGRATE_LIVE
> > >        VIR_MIGRATE_PEER2PEER
> > >        VIR_MIGRATE_TUNNELLED
> > >        VIR_MIGRATE_PERSIST_DEST
> > >        VIR_MIGRATE_UNDEFINE_SOURCE
> > >        VIR_MIGRATE_NON_SHARED_INC
> > >        VIR_MIGRATE_NON_SHARED_DISK
> > > 
> > >     which would allow the following currently available flags to be set:
> > 
> > >        VIR_MIGRATE_PAUSED
> > >        VIR_MIGRATE_CHANGE_PROTECTION
> > >        VIR_MIGRATE_UNSAFE
> > >        VIR_MIGRATE_OFFLINE
> > >        VIR_MIGRATE_COMPRESSED
> > >        VIR_MIGRATE_ABORT_ON_ERROR
> > >        VIR_MIGRATE_AUTO_CONVERGE
> > >        VIR_MIGRATE_RDMA_PIN_ALL
> > 
> > We can probably consider to provide VIR_MIGRATE_PAUSED and
> > VIR_MIGRATE_COMPRESSED as dedicated options too ?
> 
> Yes, I think any options we see regularly added to extra_flags by
> operators, and as we understand the use cases better, then we can add
> dedicated options for them.

I really don't see a case for letting the admin set VIR_MIGRATE_PAUSED
at a host level. If we want the ability to force a running VM to end
up paused after migration, this is a feature to be added to the Nova
migration API.

The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a flag,
there are other associated runtime tunables that need setting. There
was a spec discussing this which was not approved as a suitable
strategy for using it could not be agreed.

> In the review, Pawel is making a case for not allowing the operator to
> enable COMPRESSED or AUTO_CONVERGE.

I agree really. As per my comments, I in fact struggle to see a credible
case for allowing any of the remaining flags to be enabled. They are all
cases that Nova should be made todo the right thing, possibly in relation
to API parameters or other deployment choices.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list