[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

Koniszewski, Pawel pawel.koniszewski at intel.com
Tue Jun 7 12:54:23 UTC 2016


There is another fix proposed by Eli Qiao long time ago:

https://review.openstack.org/#/c/310707

Basically it blocks block live migration with BDMs and tunneling during pre-checks.

From: Timofei Durakov [mailto:tdurakov at mirantis.com]
Sent: Tuesday, June 7, 2016 9:04 AM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

I've submitted one more patch with potential fix:

https://review.openstack.org/#/c/326258/

Timofey

On Mon, Jun 6, 2016 at 11:58 PM, Timofei Durakov <tdurakov at mirantis.com<mailto:tdurakov at mirantis.com>> wrote:
On Mon, Jun 6, 2016 at 11:26 PM, Matt Riedemann <mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>> wrote:
On 6/6/2016 12:15 PM, Matt Riedemann wrote:
On 1/8/2016 12:28 PM, Mark McLoughlin wrote:
On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote:
On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui
wrote:
On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin
wrote:
Hi

commit 8ecf93e[1] got me thinking - the live_migration_flag
config option unnecessarily allows operators choose arbitrary
behavior of the migrateToURI() libvirt call, to the extent
that we allow the operator to configure a behavior that can
result in data loss[1].

I see that danpb recently said something similar:

https://review.openstack.org/171098

"Honestly, I wish we'd just kill off  'live_migration_flag'
and 'block_migration_flag' as config options. We really
should not be exposing low level libvirt API flags as admin
tunable settings.

Nova should really be in charge of picking the correct set of
flags for the current libvirt version, and the operation it
needs to perform. We might need to add other more sensible
config options in their place [..]"

Nova should really handle internal flags and this serie is
running in the right way.
...

4) Add a new config option for tunneled versus native:

[libvirt] live_migration_tunneled = true

This enables the use of the VIR_MIGRATE_TUNNELLED flag. We
have historically defaulted to tunneled mode because it
requires the least configuration and is currently the only
way to have a secure migration channel.

danpb's quote above continues with:

"perhaps a "live_migration_secure_channel" to indicate that
migration must use encryption, which would imply use of
TUNNELLED flag"

So we need to discuss whether the config option should
express the choice of tunneled vs native, or whether it
should express another choice which implies tunneled vs
native.

https://review.openstack.org/263434

We probably have to consider that operator does not know much
about internal libvirt flags, so options we are exposing for
him should reflect benefice of using them. I commented on your
review we should at least explain benefice of using this option
whatever the name is.

As predicted, plenty of discussion on this point in the review
:)

You're right that we don't give the operator any guidance in the
help message about how to choose true or false for this:

Whether to use tunneled migration, where migration data is
transported over the libvirtd connection. If True, we use the
VIR_MIGRATE_TUNNELLED migration flag

libvirt's own docs on this are here:

https://libvirt.org/migration.html#transport

which emphasizes:

- the data copies involved in tunneling - the extra configuration
steps required for native - the encryption support you get when
tunneling

The discussions I've seen on this topic wrt Nova have revolved
around:

- that tunneling allows for an encrypted transport[1] - that
qemu's NBD based drive-mirror block migration isn't supported
using tunneled mode, and that danpb is working on fixing this
limitation in libvirt - "selective" block migration[2] won't work
with the fallback qemu block migration support, and so won't
currently work in tunneled mode

I'm not working on fixing it, but IIRC some other dev had proposed
patches.

So, the advise to operators would be:

- You may want to choose tunneled=False for improved block
migration capabilities, but this limitation will go away in
future. - You may want to choose tunneled=False if you wish to
trade and encrypted transport for a (potentially negligible)
performance improvement.

Does that make sense?

As for how to name the option, and as I said in the review, I
think it makes sense to be straightforward here and make it
clearly about choosing to disable libvirt's tunneled transport.

If we name it any other way, I think our explanation for
operators will immediately jump to explaining (a) that it
influences the TUNNELLED flag, and (b) the differences between
the tunneled and native transports. So, if we're going to have to
talk about tunneled versus native, why obscure that detail?

Ultimately we need to recognise that libvirt's tunnelled mode was
added as a hack, to work around fact that QEMU lacked any kind of
native security capabilities & didn't appear likely to ever get
them at that time.  As well as not working with modern NBD based
block device encryption, it really sucks for performance because it
introduces many extra data copies. So it is going to be quite poor
for large VMs with heavy rate of data dirtying.

The only long term relative "benefit" of tunnelled mode is that it
avoids the need to open extra firewall ports.

IMHO, the long term future is to *never* use tunnelled mode for
QEMU. This will be viable when my support for native TLS support in
QEMU migration + NBD protocols is merged. I'm hopeful this wil be
for QEMU 2.6
But, Pawel strongly disagrees.

One last point I'd make is this isn't about adding a *new*
configuration capability for operators. As we deprecate and
remove these configuration options, we need to be careful not to
remove a capability that operators are currently depending on for
arguably reasonable reasons.

My view is that "live_migration_tunneled" is a reasonable
parameter to add, because there is a genuine need to let admins
choose this behaviour. We should make sure it is correctly done as
a tri-state flag though, when it is 'None', Nova should pick what
it things is the best approach based on QEMU version. Probably to
use QEMU native when it supports TLS, otherwise use tunnelled if
possible to get security.

Great feedback. I buy that.
[1] - https://review.openstack.org/#/c/171098/ [2] -
https://review.openstack.org/#/c/227278

5) Add a new config option for additional migration flags:

[libvirt] live_migration_extra_flags =
VIR_MIGRATE_COMPRESSED

This allows operators to continue to experiment with libvirt
behaviors in safe ways without each use case having to be
accounted for.

https://review.openstack.org/263435

We would disallow setting the following flags via this
option:

VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED
VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE
VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK

which would allow the following currently available flags to
be set:

VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION
VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE
VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR
VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL

We can probably consider to provide VIR_MIGRATE_PAUSED and
VIR_MIGRATE_COMPRESSED as dedicated options too ?

Yes, I think any options we see regularly added to extra_flags
by operators, and as we understand the use cases better, then we
can add dedicated options for them.

I really don't see a case for letting the admin set
VIR_MIGRATE_PAUSED at a host level. If we want the ability to force
a running VM to end up paused after migration, this is a feature to
be added to the Nova migration API.

The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a
flag, there are other associated runtime tunables that need
setting. There was a spec discussing this which was not approved as
a suitable strategy for using it could not be agreed.
In the review, Pawel is making a case for not allowing the
operator to enable COMPRESSED or AUTO_CONVERGE.

I agree really. As per my comments, I in fact struggle to see a
credible case for allowing any of the remaining flags to be
enabled. They are all cases that Nova should be made todo the right
thing, possibly in relation to API parameters or other deployment
choices.

Fair enough. I figured it would be a necessary safety valve against
operators who value the flexibility of the current configuration
options, but you make a good case.

I'll drop the extra_flags option and make tunneled a tri-state.

Thanks, Mark.

__________________________________________________________________________


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I'm going to be a jackass and skip this entire thread with a question.

The live migration CI job we have in the experimental queue is blocked
on this failure right now [1].

"Live Migration failure: Operation not supported: Selecting disks to
migrate is not implemented for tunnelled migration"

That's running with ubuntu 16.04 and libvirt 1.2.17.

I was looking at the deprecated live_migration_flag which tells you to
use the live_migration_tunnelled flag, which defaults to None and is a
no-op, even though the help text on the option says if it's omitted
we'll do our best to figure out if you want to use tunneling based on
the hypervisor's encryption support.  That doesn't actually happen by
default [2].

So what should happen? It seems some of this series was stalled or
incomplete?

We could change the configuration of our live migration CI job, but I'd
like to see the job work with the defaults that we ship.

I'm not entirely familiar with the thing that's causing the failure
except I believe we're calling migrateToURI3 with the tunnelled flag
(because it's in the default live_migration_flag option) and a list of
devices [3][4].

Would it be better to use migrateToURI2 if migrate_disks is in params
but the VIR_MIGRATE_TUNNELLED flag is set?

[1] https://bugs.launchpad.net/nova/+bug/1589591
[2]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559

[3]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751

[4]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523


Timofey is working on a fix here:

https://review.openstack.org/#/c/326111/

I've left some comments inline, but I'm mostly wondering about this now:

https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360

The code there says that if there are BDMs and libvirt<1.2.17 and you're doing block migration, it's unsupported and fails. The comment also says that it can't do that in tunneled method either - which is the bug we're hitting.

But should we add the block_migration_flag check in check_can_live_migrate_source? Or does it work to live migrate in tunneled mode as long as we don't pass 'migrate_disks'?
So we got 2 options:

  *    raise exception in case of tunneled block migration with migrate_disks param

     *    as migrate flags are supposed to be nova internals this is bad option

  *    get rid of one of these option:

     *   don't use VIR_MIGRATE_TUNNELLED - looks most preferable for now
     *   don't pass migrate disks param (it will force block live migration to fail for instances with attached volumes, and volume-backed instances with config-drive(vfat))


--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160607/0b8a2264/attachment.html>


More information about the OpenStack-dev mailing list