[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options
Timofei Durakov
tdurakov at mirantis.com
Tue Jun 7 07:04:28 UTC 2016
I've submitted one more patch with potential fix:
https://review.openstack.org/#/c/326258/
Timofey
On Mon, Jun 6, 2016 at 11:58 PM, Timofei Durakov <tdurakov at mirantis.com>
wrote:
> On Mon, Jun 6, 2016 at 11:26 PM, Matt Riedemann <
> mriedem at linux.vnet.ibm.com> wrote:
>
>> On 6/6/2016 12:15 PM, Matt Riedemann wrote:
>>
>>> On 1/8/2016 12:28 PM, Mark McLoughlin wrote:
>>>
>>>> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote:
>>>>
>>>>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
>>>>>
>>>>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag
>>>>>>>> config option unnecessarily allows operators choose arbitrary
>>>>>>>> behavior of the migrateToURI() libvirt call, to the extent
>>>>>>>> that we allow the operator to configure a behavior that can
>>>>>>>> result in data loss[1].
>>>>>>>>
>>>>>>>> I see that danpb recently said something similar:
>>>>>>>>
>>>>>>>> https://review.openstack.org/171098
>>>>>>>>
>>>>>>>> "Honestly, I wish we'd just kill off 'live_migration_flag'
>>>>>>>> and 'block_migration_flag' as config options. We really
>>>>>>>> should not be exposing low level libvirt API flags as admin
>>>>>>>> tunable settings.
>>>>>>>>
>>>>>>>> Nova should really be in charge of picking the correct set of
>>>>>>>> flags for the current libvirt version, and the operation it
>>>>>>>> needs to perform. We might need to add other more sensible
>>>>>>>> config options in their place [..]"
>>>>>>>>
>>>>>>>
>>>>>>> Nova should really handle internal flags and this serie is
>>>>>>> running in the right way.
>>>>>>>
>>>>>>> ...
>>>>>>>>
>>>>>>>
>>>>>>> 4) Add a new config option for tunneled versus native:
>>>>>>>>
>>>>>>>> [libvirt] live_migration_tunneled = true
>>>>>>>>
>>>>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We
>>>>>>>> have historically defaulted to tunneled mode because it
>>>>>>>> requires the least configuration and is currently the only
>>>>>>>> way to have a secure migration channel.
>>>>>>>>
>>>>>>>> danpb's quote above continues with:
>>>>>>>>
>>>>>>>> "perhaps a "live_migration_secure_channel" to indicate that
>>>>>>>> migration must use encryption, which would imply use of
>>>>>>>> TUNNELLED flag"
>>>>>>>>
>>>>>>>> So we need to discuss whether the config option should
>>>>>>>> express the choice of tunneled vs native, or whether it
>>>>>>>> should express another choice which implies tunneled vs
>>>>>>>> native.
>>>>>>>>
>>>>>>>> https://review.openstack.org/263434
>>>>>>>>
>>>>>>>
>>>>>>> We probably have to consider that operator does not know much
>>>>>>> about internal libvirt flags, so options we are exposing for
>>>>>>> him should reflect benefice of using them. I commented on your
>>>>>>> review we should at least explain benefice of using this option
>>>>>>> whatever the name is.
>>>>>>>
>>>>>>
>>>>>> As predicted, plenty of discussion on this point in the review
>>>>>> :)
>>>>>>
>>>>>> You're right that we don't give the operator any guidance in the
>>>>>> help message about how to choose true or false for this:
>>>>>>
>>>>>> Whether to use tunneled migration, where migration data is
>>>>>> transported over the libvirtd connection. If True, we use the
>>>>>> VIR_MIGRATE_TUNNELLED migration flag
>>>>>>
>>>>>> libvirt's own docs on this are here:
>>>>>>
>>>>>> https://libvirt.org/migration.html#transport
>>>>>>
>>>>>> which emphasizes:
>>>>>>
>>>>>> - the data copies involved in tunneling - the extra configuration
>>>>>> steps required for native - the encryption support you get when
>>>>>> tunneling
>>>>>>
>>>>>> The discussions I've seen on this topic wrt Nova have revolved
>>>>>> around:
>>>>>>
>>>>>> - that tunneling allows for an encrypted transport[1] - that
>>>>>> qemu's NBD based drive-mirror block migration isn't supported
>>>>>> using tunneled mode, and that danpb is working on fixing this
>>>>>> limitation in libvirt - "selective" block migration[2] won't work
>>>>>> with the fallback qemu block migration support, and so won't
>>>>>> currently work in tunneled mode
>>>>>>
>>>>>
>>>>> I'm not working on fixing it, but IIRC some other dev had proposed
>>>>> patches.
>>>>>
>>>>>
>>>>>> So, the advise to operators would be:
>>>>>>
>>>>>> - You may want to choose tunneled=False for improved block
>>>>>> migration capabilities, but this limitation will go away in
>>>>>> future. - You may want to choose tunneled=False if you wish to
>>>>>> trade and encrypted transport for a (potentially negligible)
>>>>>> performance improvement.
>>>>>>
>>>>>> Does that make sense?
>>>>>>
>>>>>> As for how to name the option, and as I said in the review, I
>>>>>> think it makes sense to be straightforward here and make it
>>>>>> clearly about choosing to disable libvirt's tunneled transport.
>>>>>>
>>>>>> If we name it any other way, I think our explanation for
>>>>>> operators will immediately jump to explaining (a) that it
>>>>>> influences the TUNNELLED flag, and (b) the differences between
>>>>>> the tunneled and native transports. So, if we're going to have to
>>>>>> talk about tunneled versus native, why obscure that detail?
>>>>>>
>>>>>
>>>>> Ultimately we need to recognise that libvirt's tunnelled mode was
>>>>> added as a hack, to work around fact that QEMU lacked any kind of
>>>>> native security capabilities & didn't appear likely to ever get
>>>>> them at that time. As well as not working with modern NBD based
>>>>> block device encryption, it really sucks for performance because it
>>>>> introduces many extra data copies. So it is going to be quite poor
>>>>> for large VMs with heavy rate of data dirtying.
>>>>>
>>>>> The only long term relative "benefit" of tunnelled mode is that it
>>>>> avoids the need to open extra firewall ports.
>>>>>
>>>>> IMHO, the long term future is to *never* use tunnelled mode for
>>>>> QEMU. This will be viable when my support for native TLS support in
>>>>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be
>>>>> for QEMU 2.6
>>>>>
>>>>> But, Pawel strongly disagrees.
>>>>>>
>>>>>> One last point I'd make is this isn't about adding a *new*
>>>>>> configuration capability for operators. As we deprecate and
>>>>>> remove these configuration options, we need to be careful not to
>>>>>> remove a capability that operators are currently depending on for
>>>>>> arguably reasonable reasons.
>>>>>>
>>>>>
>>>>> My view is that "live_migration_tunneled" is a reasonable
>>>>> parameter to add, because there is a genuine need to let admins
>>>>> choose this behaviour. We should make sure it is correctly done as
>>>>> a tri-state flag though, when it is 'None', Nova should pick what
>>>>> it things is the best approach based on QEMU version. Probably to
>>>>> use QEMU native when it supports TLS, otherwise use tunnelled if
>>>>> possible to get security.
>>>>>
>>>>
>>>> Great feedback. I buy that.
>>>>
>>>> [1] - https://review.openstack.org/#/c/171098/ [2] -
>>>>>> https://review.openstack.org/#/c/227278
>>>>>>
>>>>>>
>>>>>> 5) Add a new config option for additional migration flags:
>>>>>>>>
>>>>>>>> [libvirt] live_migration_extra_flags =
>>>>>>>> VIR_MIGRATE_COMPRESSED
>>>>>>>>
>>>>>>>> This allows operators to continue to experiment with libvirt
>>>>>>>> behaviors in safe ways without each use case having to be
>>>>>>>> accounted for.
>>>>>>>>
>>>>>>>> https://review.openstack.org/263435
>>>>>>>>
>>>>>>>> We would disallow setting the following flags via this
>>>>>>>> option:
>>>>>>>>
>>>>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED
>>>>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE
>>>>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK
>>>>>>>>
>>>>>>>> which would allow the following currently available flags to
>>>>>>>> be set:
>>>>>>>>
>>>>>>>
>>>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION
>>>>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE
>>>>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR
>>>>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL
>>>>>>>>
>>>>>>>
>>>>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and
>>>>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ?
>>>>>>>
>>>>>>
>>>>>> Yes, I think any options we see regularly added to extra_flags
>>>>>> by operators, and as we understand the use cases better, then we
>>>>>> can add dedicated options for them.
>>>>>>
>>>>>
>>>>> I really don't see a case for letting the admin set
>>>>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force
>>>>> a running VM to end up paused after migration, this is a feature to
>>>>> be added to the Nova migration API.
>>>>>
>>>>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a
>>>>> flag, there are other associated runtime tunables that need
>>>>> setting. There was a spec discussing this which was not approved as
>>>>> a suitable strategy for using it could not be agreed.
>>>>>
>>>>> In the review, Pawel is making a case for not allowing the
>>>>>> operator to enable COMPRESSED or AUTO_CONVERGE.
>>>>>>
>>>>>
>>>>> I agree really. As per my comments, I in fact struggle to see a
>>>>> credible case for allowing any of the remaining flags to be
>>>>> enabled. They are all cases that Nova should be made todo the right
>>>>> thing, possibly in relation to API parameters or other deployment
>>>>> choices.
>>>>>
>>>>
>>>> Fair enough. I figured it would be a necessary safety valve against
>>>> operators who value the flexibility of the current configuration
>>>> options, but you make a good case.
>>>>
>>>> I'll drop the extra_flags option and make tunneled a tri-state.
>>>>
>>>> Thanks, Mark.
>>>>
>>>>
>>>> __________________________________________________________________________
>>>>
>>>>
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>> I'm going to be a jackass and skip this entire thread with a question.
>>>
>>> The live migration CI job we have in the experimental queue is blocked
>>> on this failure right now [1].
>>>
>>> "Live Migration failure: Operation not supported: Selecting disks to
>>> migrate is not implemented for tunnelled migration"
>>>
>>> That's running with ubuntu 16.04 and libvirt 1.2.17.
>>>
>>> I was looking at the deprecated live_migration_flag which tells you to
>>> use the live_migration_tunnelled flag, which defaults to None and is a
>>> no-op, even though the help text on the option says if it's omitted
>>> we'll do our best to figure out if you want to use tunneling based on
>>> the hypervisor's encryption support. That doesn't actually happen by
>>> default [2].
>>>
>>> So what should happen? It seems some of this series was stalled or
>>> incomplete?
>>>
>>> We could change the configuration of our live migration CI job, but I'd
>>> like to see the job work with the defaults that we ship.
>>>
>>> I'm not entirely familiar with the thing that's causing the failure
>>> except I believe we're calling migrateToURI3 with the tunnelled flag
>>> (because it's in the default live_migration_flag option) and a list of
>>> devices [3][4].
>>>
>>> Would it be better to use migrateToURI2 if migrate_disks is in params
>>> but the VIR_MIGRATE_TUNNELLED flag is set?
>>>
>>> [1] https://bugs.launchpad.net/nova/+bug/1589591
>>> [2]
>>>
>>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559
>>>
>>> [3]
>>>
>>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751
>>>
>>> [4]
>>>
>>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523
>>>
>>>
>>>
>> Timofey is working on a fix here:
>>
>> https://review.openstack.org/#/c/326111/
>>
>> I've left some comments inline, but I'm mostly wondering about this now:
>>
>>
>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360
>>
>> The code there says that if there are BDMs and libvirt<1.2.17 and you're
>> doing block migration, it's unsupported and fails. The comment also says
>> that it can't do that in tunneled method either - which is the bug we're
>> hitting.
>>
>> But should we add the block_migration_flag check in
>> check_can_live_migrate_source? Or does it work to live migrate in tunneled
>> mode as long as we don't pass 'migrate_disks'?
>
> So we got 2 options:
>
> - raise exception in case of tunneled block migration with
> migrate_disks param
> - as migrate flags are supposed to be nova internals this is bad
> option
> - get rid of one of these option:
> - don't use VIR_MIGRATE_TUNNELLED - looks most preferable for now
> - don't pass migrate disks param (it will force block live
> migration to fail for instances with attached volumes, and volume-backed
> instances with config-drive(vfat))
>
>
>
>
>> --
>>
>> Thanks,
>>
>> Matt Riedemann
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160607/58fd352b/attachment.html>
More information about the OpenStack-dev
mailing list