[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options

Timofei Durakov tdurakov at mirantis.com
Mon Jun 6 20:58:14 UTC 2016


On Mon, Jun 6, 2016 at 11:26 PM, Matt Riedemann <mriedem at linux.vnet.ibm.com>
wrote:

> On 6/6/2016 12:15 PM, Matt Riedemann wrote:
>
>> On 1/8/2016 12:28 PM, Mark McLoughlin wrote:
>>
>>> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote:
>>>
>>>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote:
>>>>
>>>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui
>>>>> wrote:
>>>>>
>>>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag
>>>>>>> config option unnecessarily allows operators choose arbitrary
>>>>>>> behavior of the migrateToURI() libvirt call, to the extent
>>>>>>> that we allow the operator to configure a behavior that can
>>>>>>> result in data loss[1].
>>>>>>>
>>>>>>> I see that danpb recently said something similar:
>>>>>>>
>>>>>>> https://review.openstack.org/171098
>>>>>>>
>>>>>>> "Honestly, I wish we'd just kill off  'live_migration_flag'
>>>>>>> and 'block_migration_flag' as config options. We really
>>>>>>> should not be exposing low level libvirt API flags as admin
>>>>>>> tunable settings.
>>>>>>>
>>>>>>> Nova should really be in charge of picking the correct set of
>>>>>>> flags for the current libvirt version, and the operation it
>>>>>>> needs to perform. We might need to add other more sensible
>>>>>>> config options in their place [..]"
>>>>>>>
>>>>>>
>>>>>> Nova should really handle internal flags and this serie is
>>>>>> running in the right way.
>>>>>>
>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> 4) Add a new config option for tunneled versus native:
>>>>>>>
>>>>>>> [libvirt] live_migration_tunneled = true
>>>>>>>
>>>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We
>>>>>>> have historically defaulted to tunneled mode because it
>>>>>>> requires the least configuration and is currently the only
>>>>>>> way to have a secure migration channel.
>>>>>>>
>>>>>>> danpb's quote above continues with:
>>>>>>>
>>>>>>> "perhaps a "live_migration_secure_channel" to indicate that
>>>>>>> migration must use encryption, which would imply use of
>>>>>>> TUNNELLED flag"
>>>>>>>
>>>>>>> So we need to discuss whether the config option should
>>>>>>> express the choice of tunneled vs native, or whether it
>>>>>>> should express another choice which implies tunneled vs
>>>>>>> native.
>>>>>>>
>>>>>>> https://review.openstack.org/263434
>>>>>>>
>>>>>>
>>>>>> We probably have to consider that operator does not know much
>>>>>> about internal libvirt flags, so options we are exposing for
>>>>>> him should reflect benefice of using them. I commented on your
>>>>>> review we should at least explain benefice of using this option
>>>>>> whatever the name is.
>>>>>>
>>>>>
>>>>> As predicted, plenty of discussion on this point in the review
>>>>> :)
>>>>>
>>>>> You're right that we don't give the operator any guidance in the
>>>>> help message about how to choose true or false for this:
>>>>>
>>>>> Whether to use tunneled migration, where migration data is
>>>>> transported over the libvirtd connection. If True, we use the
>>>>> VIR_MIGRATE_TUNNELLED migration flag
>>>>>
>>>>> libvirt's own docs on this are here:
>>>>>
>>>>> https://libvirt.org/migration.html#transport
>>>>>
>>>>> which emphasizes:
>>>>>
>>>>> - the data copies involved in tunneling - the extra configuration
>>>>> steps required for native - the encryption support you get when
>>>>> tunneling
>>>>>
>>>>> The discussions I've seen on this topic wrt Nova have revolved
>>>>> around:
>>>>>
>>>>> - that tunneling allows for an encrypted transport[1] - that
>>>>> qemu's NBD based drive-mirror block migration isn't supported
>>>>> using tunneled mode, and that danpb is working on fixing this
>>>>> limitation in libvirt - "selective" block migration[2] won't work
>>>>> with the fallback qemu block migration support, and so won't
>>>>> currently work in tunneled mode
>>>>>
>>>>
>>>> I'm not working on fixing it, but IIRC some other dev had proposed
>>>> patches.
>>>>
>>>>
>>>>> So, the advise to operators would be:
>>>>>
>>>>> - You may want to choose tunneled=False for improved block
>>>>> migration capabilities, but this limitation will go away in
>>>>> future. - You may want to choose tunneled=False if you wish to
>>>>> trade and encrypted transport for a (potentially negligible)
>>>>> performance improvement.
>>>>>
>>>>> Does that make sense?
>>>>>
>>>>> As for how to name the option, and as I said in the review, I
>>>>> think it makes sense to be straightforward here and make it
>>>>> clearly about choosing to disable libvirt's tunneled transport.
>>>>>
>>>>> If we name it any other way, I think our explanation for
>>>>> operators will immediately jump to explaining (a) that it
>>>>> influences the TUNNELLED flag, and (b) the differences between
>>>>> the tunneled and native transports. So, if we're going to have to
>>>>> talk about tunneled versus native, why obscure that detail?
>>>>>
>>>>
>>>> Ultimately we need to recognise that libvirt's tunnelled mode was
>>>> added as a hack, to work around fact that QEMU lacked any kind of
>>>> native security capabilities & didn't appear likely to ever get
>>>> them at that time.  As well as not working with modern NBD based
>>>> block device encryption, it really sucks for performance because it
>>>> introduces many extra data copies. So it is going to be quite poor
>>>> for large VMs with heavy rate of data dirtying.
>>>>
>>>> The only long term relative "benefit" of tunnelled mode is that it
>>>> avoids the need to open extra firewall ports.
>>>>
>>>> IMHO, the long term future is to *never* use tunnelled mode for
>>>> QEMU. This will be viable when my support for native TLS support in
>>>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be
>>>> for QEMU 2.6
>>>>
>>>> But, Pawel strongly disagrees.
>>>>>
>>>>> One last point I'd make is this isn't about adding a *new*
>>>>> configuration capability for operators. As we deprecate and
>>>>> remove these configuration options, we need to be careful not to
>>>>> remove a capability that operators are currently depending on for
>>>>> arguably reasonable reasons.
>>>>>
>>>>
>>>> My view is that "live_migration_tunneled" is a reasonable
>>>> parameter to add, because there is a genuine need to let admins
>>>> choose this behaviour. We should make sure it is correctly done as
>>>> a tri-state flag though, when it is 'None', Nova should pick what
>>>> it things is the best approach based on QEMU version. Probably to
>>>> use QEMU native when it supports TLS, otherwise use tunnelled if
>>>> possible to get security.
>>>>
>>>
>>> Great feedback. I buy that.
>>>
>>> [1] - https://review.openstack.org/#/c/171098/ [2] -
>>>>> https://review.openstack.org/#/c/227278
>>>>>
>>>>>
>>>>> 5) Add a new config option for additional migration flags:
>>>>>>>
>>>>>>> [libvirt] live_migration_extra_flags =
>>>>>>> VIR_MIGRATE_COMPRESSED
>>>>>>>
>>>>>>> This allows operators to continue to experiment with libvirt
>>>>>>> behaviors in safe ways without each use case having to be
>>>>>>> accounted for.
>>>>>>>
>>>>>>> https://review.openstack.org/263435
>>>>>>>
>>>>>>> We would disallow setting the following flags via this
>>>>>>> option:
>>>>>>>
>>>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED
>>>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE
>>>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK
>>>>>>>
>>>>>>> which would allow the following currently available flags to
>>>>>>> be set:
>>>>>>>
>>>>>>
>>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION
>>>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE
>>>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR
>>>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL
>>>>>>>
>>>>>>
>>>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and
>>>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ?
>>>>>>
>>>>>
>>>>> Yes, I think any options we see regularly added to extra_flags
>>>>> by operators, and as we understand the use cases better, then we
>>>>> can add dedicated options for them.
>>>>>
>>>>
>>>> I really don't see a case for letting the admin set
>>>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force
>>>> a running VM to end up paused after migration, this is a feature to
>>>> be added to the Nova migration API.
>>>>
>>>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a
>>>> flag, there are other associated runtime tunables that need
>>>> setting. There was a spec discussing this which was not approved as
>>>> a suitable strategy for using it could not be agreed.
>>>>
>>>> In the review, Pawel is making a case for not allowing the
>>>>> operator to enable COMPRESSED or AUTO_CONVERGE.
>>>>>
>>>>
>>>> I agree really. As per my comments, I in fact struggle to see a
>>>> credible case for allowing any of the remaining flags to be
>>>> enabled. They are all cases that Nova should be made todo the right
>>>> thing, possibly in relation to API parameters or other deployment
>>>> choices.
>>>>
>>>
>>> Fair enough. I figured it would be a necessary safety valve against
>>> operators who value the flexibility of the current configuration
>>> options, but you make a good case.
>>>
>>> I'll drop the extra_flags option and make tunneled a tri-state.
>>>
>>> Thanks, Mark.
>>>
>>>
>>> __________________________________________________________________________
>>>
>>>
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>> I'm going to be a jackass and skip this entire thread with a question.
>>
>> The live migration CI job we have in the experimental queue is blocked
>> on this failure right now [1].
>>
>> "Live Migration failure: Operation not supported: Selecting disks to
>> migrate is not implemented for tunnelled migration"
>>
>> That's running with ubuntu 16.04 and libvirt 1.2.17.
>>
>> I was looking at the deprecated live_migration_flag which tells you to
>> use the live_migration_tunnelled flag, which defaults to None and is a
>> no-op, even though the help text on the option says if it's omitted
>> we'll do our best to figure out if you want to use tunneling based on
>> the hypervisor's encryption support.  That doesn't actually happen by
>> default [2].
>>
>> So what should happen? It seems some of this series was stalled or
>> incomplete?
>>
>> We could change the configuration of our live migration CI job, but I'd
>> like to see the job work with the defaults that we ship.
>>
>> I'm not entirely familiar with the thing that's causing the failure
>> except I believe we're calling migrateToURI3 with the tunnelled flag
>> (because it's in the default live_migration_flag option) and a list of
>> devices [3][4].
>>
>> Would it be better to use migrateToURI2 if migrate_disks is in params
>> but the VIR_MIGRATE_TUNNELLED flag is set?
>>
>> [1] https://bugs.launchpad.net/nova/+bug/1589591
>> [2]
>>
>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559
>>
>> [3]
>>
>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751
>>
>> [4]
>>
>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523
>>
>>
>>
> Timofey is working on a fix here:
>
> https://review.openstack.org/#/c/326111/
>
> I've left some comments inline, but I'm mostly wondering about this now:
>
>
> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360
>
> The code there says that if there are BDMs and libvirt<1.2.17 and you're
> doing block migration, it's unsupported and fails. The comment also says
> that it can't do that in tunneled method either - which is the bug we're
> hitting.
>
> But should we add the block_migration_flag check in
> check_can_live_migrate_source? Or does it work to live migrate in tunneled
> mode as long as we don't pass 'migrate_disks'?

So we got 2 options:

   -  raise exception in case of tunneled block migration with
   migrate_disks param
      -  as migrate flags are supposed to be nova internals this is bad
      option
   -  get rid of one of these option:
   - don't use VIR_MIGRATE_TUNNELLED - looks most preferable for now
      - don't pass migrate disks param (it will force block live migration
      to fail for instances with attached volumes, and volume-backed instances
      with config-drive(vfat))




> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160606/7f0d2907/attachment.html>


More information about the OpenStack-dev mailing list