[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options
Mark McLoughlin
markmc at redhat.com
Mon Jan 4 21:12:06 UTC 2016
Hi
commit 8ecf93e[1] got me thinking - the live_migration_flag config
option unnecessarily allows operators choose arbitrary behavior of the
migrateToURI() libvirt call, to the extent that we allow the operator
to configure a behavior that can result in data loss[1].
I see that danpb recently said something similar:
https://review.openstack.org/171098
"Honestly, I wish we'd just kill off 'live_migration_flag' and
'block_migration_flag' as config options. We really should not be
exposing low level libvirt API flags as admin tunable settings.
Nova should really be in charge of picking the correct set of flags
for the current libvirt version, and the operation it needs to
perform. We might need to add other more sensible config options in
their place [..]"
I've just proposed a series of patches, which boils down to the
following steps:
1) Modify the approach taken in commit 8ecf93e so that instead of
just warning about unsafe use of NON_SHARED_INC, we fix up the
config option to a safe value.
https://review.openstack.org/263431
2) Hard-code the P2P flag for live and block migrations as
appropriate for the libvirt driver being used.
For the qemu driver, We should always use VIR_MIGRATE_PEER2PEER
both live and block migrations. Without this option, you get:
Live Migration failure: Requested operation is not valid: direct migration is not supported by the connection driver
OTOH, the Xen driver does not support P2P, and only supports
"unmanaged direct connection".
https://review.openstack.org/263432
3) Require the use of the UNDEFINE_SOURCE flag, and the non-use of
the PERSIST_DEST flag.
Nova itself persists the domain configuration on the destination
host, but it assumes the libvirt migration call removes it from
the
source host. So it makes no sense to allow operators configure
these flags.
https://review.openstack.org/263433
4) Add a new config option for tunneled versus native:
[libvirt]
live_migration_tunneled = true
This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have
historically defaulted to tunneled mode because it requires the
least configuration and is currently the only way to have a
secure migration channel.
danpb's quote above continues with:
"perhaps a "live_migration_secure_channel" to indicate that
migration must use encryption, which would imply use of
TUNNELLED flag"
So we need to discuss whether the config option should express the
choice of tunneled vs native, or whether it should express another
choice which implies tunneled vs native.
https://review.openstack.org/263434
5) Add a new config option for additional migration flags:
[libvirt]
live_migration_extra_flags = VIR_MIGRATE_COMPRESSED
This allows operators to continue to experiment with libvirt behaviors
in safe ways without each use case having to be accounted for.
https://review.openstack.org/263435
We would disallow setting the following flags via this option:
VIR_MIGRATE_LIVE
VIR_MIGRATE_PEER2PEER
VIR_MIGRATE_TUNNELLED
VIR_MIGRATE_PERSIST_DEST
VIR_MIGRATE_UNDEFINE_SOURCE
VIR_MIGRATE_NON_SHARED_INC
VIR_MIGRATE_NON_SHARED_DISK
which would allow the following currently available flags to be set:
VIR_MIGRATE_PAUSED
VIR_MIGRATE_CHANGE_PROTECTION
VIR_MIGRATE_UNSAFE
VIR_MIGRATE_OFFLINE
VIR_MIGRATE_COMPRESSED
VIR_MIGRATE_ABORT_ON_ERROR
VIR_MIGRATE_AUTO_CONVERGE
VIR_MIGRATE_RDMA_PIN_ALL
6) Deprecate the existing live_migration_flag and block_migration_flag
config options. Operators would be expected to migrate to using the
live_migration_tunneled or live_migration_extra_flags config options.
During the deprecation period we would invite feedback as to whether
additional config options are needed to cover unanticipated use cases.
https://review.openstack.org/263436
Thanks in advance for any feedback.
I'm going to guess that one piece of feedback will be that some subset
of this needs a blueprint (and maybe a spec), and that the blueprint
freeze was a month ago, so that subset needs to be punted until after
Mitaka? I'd love to be wrong about that, though :)
Thanks,
Mark.
[1] - https://review.openstack.org/228853
[2] - Data loss can occur when you have disk images on shared storage and
you specify the VIR_MIGRATE_NON_SHARED_INC or VIR_MIGRATE_NON_SHARED_DISK because during the block migration the disk is copied back over itself
while it is in use from another node.
More information about the OpenStack-dev
mailing list