[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

Matt Riedemann mriedem at linux.vnet.ibm.com
Mon Jun 6 17:15:38 UTC 2016


On 1/8/2016 12:28 PM, Mark McLoughlin wrote:
> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote:
>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui
>>> wrote:
>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin
>>>> wrote:
>>>>> Hi
>>>>>
>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag
>>>>> config option unnecessarily allows operators choose arbitrary
>>>>> behavior of the migrateToURI() libvirt call, to the extent
>>>>> that we allow the operator to configure a behavior that can
>>>>> result in data loss[1].
>>>>>
>>>>> I see that danpb recently said something similar:
>>>>>
>>>>> https://review.openstack.org/171098
>>>>>
>>>>> "Honestly, I wish we'd just kill off  'live_migration_flag'
>>>>> and 'block_migration_flag' as config options. We really
>>>>> should not be exposing low level libvirt API flags as admin
>>>>> tunable settings.
>>>>>
>>>>> Nova should really be in charge of picking the correct set of
>>>>> flags for the current libvirt version, and the operation it
>>>>> needs to perform. We might need to add other more sensible
>>>>> config options in their place [..]"
>>>>
>>>> Nova should really handle internal flags and this serie is
>>>> running in the right way.
>>>>
>>>>> ...
>>>>
>>>>> 4) Add a new config option for tunneled versus native:
>>>>>
>>>>> [libvirt] live_migration_tunneled = true
>>>>>
>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We
>>>>> have historically defaulted to tunneled mode because it
>>>>> requires the least configuration and is currently the only
>>>>> way to have a secure migration channel.
>>>>>
>>>>> danpb's quote above continues with:
>>>>>
>>>>> "perhaps a "live_migration_secure_channel" to indicate that
>>>>> migration must use encryption, which would imply use of
>>>>> TUNNELLED flag"
>>>>>
>>>>> So we need to discuss whether the config option should
>>>>> express the choice of tunneled vs native, or whether it
>>>>> should express another choice which implies tunneled vs
>>>>> native.
>>>>>
>>>>> https://review.openstack.org/263434
>>>>
>>>> We probably have to consider that operator does not know much
>>>> about internal libvirt flags, so options we are exposing for
>>>> him should reflect benefice of using them. I commented on your
>>>> review we should at least explain benefice of using this option
>>>> whatever the name is.
>>>
>>> As predicted, plenty of discussion on this point in the review
>>> :)
>>>
>>> You're right that we don't give the operator any guidance in the
>>> help message about how to choose true or false for this:
>>>
>>> Whether to use tunneled migration, where migration data is
>>> transported over the libvirtd connection. If True, we use the
>>> VIR_MIGRATE_TUNNELLED migration flag
>>>
>>> libvirt's own docs on this are here:
>>>
>>> https://libvirt.org/migration.html#transport
>>>
>>> which emphasizes:
>>>
>>> - the data copies involved in tunneling - the extra configuration
>>> steps required for native - the encryption support you get when
>>> tunneling
>>>
>>> The discussions I've seen on this topic wrt Nova have revolved
>>> around:
>>>
>>> - that tunneling allows for an encrypted transport[1] - that
>>> qemu's NBD based drive-mirror block migration isn't supported
>>> using tunneled mode, and that danpb is working on fixing this
>>> limitation in libvirt - "selective" block migration[2] won't work
>>> with the fallback qemu block migration support, and so won't
>>> currently work in tunneled mode
>>
>> I'm not working on fixing it, but IIRC some other dev had proposed
>> patches.
>>
>>>
>>> So, the advise to operators would be:
>>>
>>> - You may want to choose tunneled=False for improved block
>>> migration capabilities, but this limitation will go away in
>>> future. - You may want to choose tunneled=False if you wish to
>>> trade and encrypted transport for a (potentially negligible)
>>> performance improvement.
>>>
>>> Does that make sense?
>>>
>>> As for how to name the option, and as I said in the review, I
>>> think it makes sense to be straightforward here and make it
>>> clearly about choosing to disable libvirt's tunneled transport.
>>>
>>> If we name it any other way, I think our explanation for
>>> operators will immediately jump to explaining (a) that it
>>> influences the TUNNELLED flag, and (b) the differences between
>>> the tunneled and native transports. So, if we're going to have to
>>> talk about tunneled versus native, why obscure that detail?
>>
>> Ultimately we need to recognise that libvirt's tunnelled mode was
>> added as a hack, to work around fact that QEMU lacked any kind of
>> native security capabilities & didn't appear likely to ever get
>> them at that time.  As well as not working with modern NBD based
>> block device encryption, it really sucks for performance because it
>> introduces many extra data copies. So it is going to be quite poor
>> for large VMs with heavy rate of data dirtying.
>>
>> The only long term relative "benefit" of tunnelled mode is that it
>> avoids the need to open extra firewall ports.
>>
>> IMHO, the long term future is to *never* use tunnelled mode for
>> QEMU. This will be viable when my support for native TLS support in
>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be
>> for QEMU 2.6
>>
>>> But, Pawel strongly disagrees.
>>>
>>> One last point I'd make is this isn't about adding a *new*
>>> configuration capability for operators. As we deprecate and
>>> remove these configuration options, we need to be careful not to
>>> remove a capability that operators are currently depending on for
>>> arguably reasonable reasons.
>>
>> My view is that "live_migration_tunneled" is a reasonable
>> parameter to add, because there is a genuine need to let admins
>> choose this behaviour. We should make sure it is correctly done as
>> a tri-state flag though, when it is 'None', Nova should pick what
>> it things is the best approach based on QEMU version. Probably to
>> use QEMU native when it supports TLS, otherwise use tunnelled if
>> possible to get security.
>
> Great feedback. I buy that.
>
>>> [1] - https://review.openstack.org/#/c/171098/ [2] -
>>> https://review.openstack.org/#/c/227278
>>>
>>>
>>>>> 5) Add a new config option for additional migration flags:
>>>>>
>>>>> [libvirt] live_migration_extra_flags =
>>>>> VIR_MIGRATE_COMPRESSED
>>>>>
>>>>> This allows operators to continue to experiment with libvirt
>>>>> behaviors in safe ways without each use case having to be
>>>>> accounted for.
>>>>>
>>>>> https://review.openstack.org/263435
>>>>>
>>>>> We would disallow setting the following flags via this
>>>>> option:
>>>>>
>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED
>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE
>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK
>>>>>
>>>>> which would allow the following currently available flags to
>>>>> be set:
>>>>
>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION
>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE
>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR
>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL
>>>>
>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and
>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ?
>>>
>>> Yes, I think any options we see regularly added to extra_flags
>>> by operators, and as we understand the use cases better, then we
>>> can add dedicated options for them.
>>
>> I really don't see a case for letting the admin set
>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force
>> a running VM to end up paused after migration, this is a feature to
>> be added to the Nova migration API.
>>
>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a
>> flag, there are other associated runtime tunables that need
>> setting. There was a spec discussing this which was not approved as
>> a suitable strategy for using it could not be agreed.
>>
>>> In the review, Pawel is making a case for not allowing the
>>> operator to enable COMPRESSED or AUTO_CONVERGE.
>>
>> I agree really. As per my comments, I in fact struggle to see a
>> credible case for allowing any of the remaining flags to be
>> enabled. They are all cases that Nova should be made todo the right
>> thing, possibly in relation to API parameters or other deployment
>> choices.
>
> Fair enough. I figured it would be a necessary safety valve against
> operators who value the flexibility of the current configuration
> options, but you make a good case.
>
> I'll drop the extra_flags option and make tunneled a tri-state.
>
> Thanks, Mark.
>
> __________________________________________________________________________
>
>
OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

I'm going to be a jackass and skip this entire thread with a question.

The live migration CI job we have in the experimental queue is blocked
on this failure right now [1].

"Live Migration failure: Operation not supported: Selecting disks to
migrate is not implemented for tunnelled migration"

That's running with ubuntu 16.04 and libvirt 1.2.17.

I was looking at the deprecated live_migration_flag which tells you to
use the live_migration_tunnelled flag, which defaults to None and is a
no-op, even though the help text on the option says if it's omitted
we'll do our best to figure out if you want to use tunneling based on
the hypervisor's encryption support.  That doesn't actually happen by
default [2].

So what should happen? It seems some of this series was stalled or
incomplete?

We could change the configuration of our live migration CI job, but I'd
like to see the job work with the defaults that we ship.

I'm not entirely familiar with the thing that's causing the failure
except I believe we're calling migrateToURI3 with the tunnelled flag
(because it's in the default live_migration_flag option) and a list of
devices [3][4].

Would it be better to use migrateToURI2 if migrate_disks is in params 
but the VIR_MIGRATE_TUNNELLED flag is set?

[1] https://bugs.launchpad.net/nova/+bug/1589591
[2]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559
[3]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751
[4]
https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list