[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

Matt Riedemann mriedem at linux.vnet.ibm.com
Mon Jun 6 20:26:29 UTC 2016


On 6/6/2016 12:15 PM, Matt Riedemann wrote:
> On 1/8/2016 12:28 PM, Mark McLoughlin wrote:
>> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote:
>>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
>>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui
>>>> wrote:
>>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin
>>>>> wrote:
>>>>>> Hi
>>>>>>
>>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag
>>>>>> config option unnecessarily allows operators choose arbitrary
>>>>>> behavior of the migrateToURI() libvirt call, to the extent
>>>>>> that we allow the operator to configure a behavior that can
>>>>>> result in data loss[1].
>>>>>>
>>>>>> I see that danpb recently said something similar:
>>>>>>
>>>>>> https://review.openstack.org/171098
>>>>>>
>>>>>> "Honestly, I wish we'd just kill off  'live_migration_flag'
>>>>>> and 'block_migration_flag' as config options. We really
>>>>>> should not be exposing low level libvirt API flags as admin
>>>>>> tunable settings.
>>>>>>
>>>>>> Nova should really be in charge of picking the correct set of
>>>>>> flags for the current libvirt version, and the operation it
>>>>>> needs to perform. We might need to add other more sensible
>>>>>> config options in their place [..]"
>>>>>
>>>>> Nova should really handle internal flags and this serie is
>>>>> running in the right way.
>>>>>
>>>>>> ...
>>>>>
>>>>>> 4) Add a new config option for tunneled versus native:
>>>>>>
>>>>>> [libvirt] live_migration_tunneled = true
>>>>>>
>>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We
>>>>>> have historically defaulted to tunneled mode because it
>>>>>> requires the least configuration and is currently the only
>>>>>> way to have a secure migration channel.
>>>>>>
>>>>>> danpb's quote above continues with:
>>>>>>
>>>>>> "perhaps a "live_migration_secure_channel" to indicate that
>>>>>> migration must use encryption, which would imply use of
>>>>>> TUNNELLED flag"
>>>>>>
>>>>>> So we need to discuss whether the config option should
>>>>>> express the choice of tunneled vs native, or whether it
>>>>>> should express another choice which implies tunneled vs
>>>>>> native.
>>>>>>
>>>>>> https://review.openstack.org/263434
>>>>>
>>>>> We probably have to consider that operator does not know much
>>>>> about internal libvirt flags, so options we are exposing for
>>>>> him should reflect benefice of using them. I commented on your
>>>>> review we should at least explain benefice of using this option
>>>>> whatever the name is.
>>>>
>>>> As predicted, plenty of discussion on this point in the review
>>>> :)
>>>>
>>>> You're right that we don't give the operator any guidance in the
>>>> help message about how to choose true or false for this:
>>>>
>>>> Whether to use tunneled migration, where migration data is
>>>> transported over the libvirtd connection. If True, we use the
>>>> VIR_MIGRATE_TUNNELLED migration flag
>>>>
>>>> libvirt's own docs on this are here:
>>>>
>>>> https://libvirt.org/migration.html#transport
>>>>
>>>> which emphasizes:
>>>>
>>>> - the data copies involved in tunneling - the extra configuration
>>>> steps required for native - the encryption support you get when
>>>> tunneling
>>>>
>>>> The discussions I've seen on this topic wrt Nova have revolved
>>>> around:
>>>>
>>>> - that tunneling allows for an encrypted transport[1] - that
>>>> qemu's NBD based drive-mirror block migration isn't supported
>>>> using tunneled mode, and that danpb is working on fixing this
>>>> limitation in libvirt - "selective" block migration[2] won't work
>>>> with the fallback qemu block migration support, and so won't
>>>> currently work in tunneled mode
>>>
>>> I'm not working on fixing it, but IIRC some other dev had proposed
>>> patches.
>>>
>>>>
>>>> So, the advise to operators would be:
>>>>
>>>> - You may want to choose tunneled=False for improved block
>>>> migration capabilities, but this limitation will go away in
>>>> future. - You may want to choose tunneled=False if you wish to
>>>> trade and encrypted transport for a (potentially negligible)
>>>> performance improvement.
>>>>
>>>> Does that make sense?
>>>>
>>>> As for how to name the option, and as I said in the review, I
>>>> think it makes sense to be straightforward here and make it
>>>> clearly about choosing to disable libvirt's tunneled transport.
>>>>
>>>> If we name it any other way, I think our explanation for
>>>> operators will immediately jump to explaining (a) that it
>>>> influences the TUNNELLED flag, and (b) the differences between
>>>> the tunneled and native transports. So, if we're going to have to
>>>> talk about tunneled versus native, why obscure that detail?
>>>
>>> Ultimately we need to recognise that libvirt's tunnelled mode was
>>> added as a hack, to work around fact that QEMU lacked any kind of
>>> native security capabilities & didn't appear likely to ever get
>>> them at that time.  As well as not working with modern NBD based
>>> block device encryption, it really sucks for performance because it
>>> introduces many extra data copies. So it is going to be quite poor
>>> for large VMs with heavy rate of data dirtying.
>>>
>>> The only long term relative "benefit" of tunnelled mode is that it
>>> avoids the need to open extra firewall ports.
>>>
>>> IMHO, the long term future is to *never* use tunnelled mode for
>>> QEMU. This will be viable when my support for native TLS support in
>>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be
>>> for QEMU 2.6
>>>
>>>> But, Pawel strongly disagrees.
>>>>
>>>> One last point I'd make is this isn't about adding a *new*
>>>> configuration capability for operators. As we deprecate and
>>>> remove these configuration options, we need to be careful not to
>>>> remove a capability that operators are currently depending on for
>>>> arguably reasonable reasons.
>>>
>>> My view is that "live_migration_tunneled" is a reasonable
>>> parameter to add, because there is a genuine need to let admins
>>> choose this behaviour. We should make sure it is correctly done as
>>> a tri-state flag though, when it is 'None', Nova should pick what
>>> it things is the best approach based on QEMU version. Probably to
>>> use QEMU native when it supports TLS, otherwise use tunnelled if
>>> possible to get security.
>>
>> Great feedback. I buy that.
>>
>>>> [1] - https://review.openstack.org/#/c/171098/ [2] -
>>>> https://review.openstack.org/#/c/227278
>>>>
>>>>
>>>>>> 5) Add a new config option for additional migration flags:
>>>>>>
>>>>>> [libvirt] live_migration_extra_flags =
>>>>>> VIR_MIGRATE_COMPRESSED
>>>>>>
>>>>>> This allows operators to continue to experiment with libvirt
>>>>>> behaviors in safe ways without each use case having to be
>>>>>> accounted for.
>>>>>>
>>>>>> https://review.openstack.org/263435
>>>>>>
>>>>>> We would disallow setting the following flags via this
>>>>>> option:
>>>>>>
>>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED
>>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE
>>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK
>>>>>>
>>>>>> which would allow the following currently available flags to
>>>>>> be set:
>>>>>
>>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION
>>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE
>>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR
>>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL
>>>>>
>>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and
>>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ?
>>>>
>>>> Yes, I think any options we see regularly added to extra_flags
>>>> by operators, and as we understand the use cases better, then we
>>>> can add dedicated options for them.
>>>
>>> I really don't see a case for letting the admin set
>>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force
>>> a running VM to end up paused after migration, this is a feature to
>>> be added to the Nova migration API.
>>>
>>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a
>>> flag, there are other associated runtime tunables that need
>>> setting. There was a spec discussing this which was not approved as
>>> a suitable strategy for using it could not be agreed.
>>>
>>>> In the review, Pawel is making a case for not allowing the
>>>> operator to enable COMPRESSED or AUTO_CONVERGE.
>>>
>>> I agree really. As per my comments, I in fact struggle to see a
>>> credible case for allowing any of the remaining flags to be
>>> enabled. They are all cases that Nova should be made todo the right
>>> thing, possibly in relation to API parameters or other deployment
>>> choices.
>>
>> Fair enough. I figured it would be a necessary safety valve against
>> operators who value the flexibility of the current configuration
>> options, but you make a good case.
>>
>> I'll drop the extra_flags option and make tunneled a tri-state.
>>
>> Thanks, Mark.
>>
>> __________________________________________________________________________
>>
>>
>>
> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> I'm going to be a jackass and skip this entire thread with a question.
>
> The live migration CI job we have in the experimental queue is blocked
> on this failure right now [1].
>
> "Live Migration failure: Operation not supported: Selecting disks to
> migrate is not implemented for tunnelled migration"
>
> That's running with ubuntu 16.04 and libvirt 1.2.17.
>
> I was looking at the deprecated live_migration_flag which tells you to
> use the live_migration_tunnelled flag, which defaults to None and is a
> no-op, even though the help text on the option says if it's omitted
> we'll do our best to figure out if you want to use tunneling based on
> the hypervisor's encryption support.  That doesn't actually happen by
> default [2].
>
> So what should happen? It seems some of this series was stalled or
> incomplete?
>
> We could change the configuration of our live migration CI job, but I'd
> like to see the job work with the defaults that we ship.
>
> I'm not entirely familiar with the thing that's causing the failure
> except I believe we're calling migrateToURI3 with the tunnelled flag
> (because it's in the default live_migration_flag option) and a list of
> devices [3][4].
>
> Would it be better to use migrateToURI2 if migrate_disks is in params
> but the VIR_MIGRATE_TUNNELLED flag is set?
>
> [1] https://bugs.launchpad.net/nova/+bug/1589591
> [2]
> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559
>
> [3]
> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751
>
> [4]
> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523
>
>

Timofey is working on a fix here:

https://review.openstack.org/#/c/326111/

I've left some comments inline, but I'm mostly wondering about this now:

https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360

The code there says that if there are BDMs and libvirt<1.2.17 and you're 
doing block migration, it's unsupported and fails. The comment also says 
that it can't do that in tunneled method either - which is the bug we're 
hitting.

But should we add the block_migration_flag check in 
check_can_live_migrate_source? Or does it work to live migrate in 
tunneled mode as long as we don't pass 'migrate_disks'?

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list