[openstack-dev] [Nova] Update on live migration priority

Kashyap Chamarthy kchamart at redhat.com
Mon Feb 15 10:24:49 UTC 2016

On Fri, Feb 12, 2016 at 04:21:27PM +0000, Murray, Paul (HP Cloud) wrote:
> This time with a tag in case anyone is filtering...

Yep, I was filtering, and would've missed it without your tag. :-)

> From: Murray, Paul (HP Cloud)
> Sent: 12 February 2016 16:16
> To: openstack-dev at lists.openstack.org
> Subject: [openstack-dev] Update on live migration priority
> The objective for the live migration priority is to improve the
> stability of migrations based on operator experience. The high level
> approach is to do the following:
> 1.       Improve CI
> 2.       Improve documentation
> 3.       Improve manageability of migrations
> 4.       Fix bugs
> In this cycle we targeted a few immediately implementable features
> that would help, specifically giving operators commands to allow them
> to manage migrations (inspect progress, force completion, and cancel)
> and improve security (split-networks and remove ssh-based
> resize/migration; aka storage pools).
> Most of these are on track to be completed in this cycle with the
> exception of storage pools work which is being deferred. Further
> details follow.
> Expand CI coverage - in progress
> There is a job in the experimental queue called:
> gate-tempest-dsvm-multinode-live-migrationqueued. This will become the
> job that performs live migration tests; any live migration tests in
> other jobs will be removed. At present the job has been configured to
> cover different storage configurations including cinder, NFS, ceph.
> Tests are now being added to the job. Patches are currently up for
> live migration of instances with swap and instances with ephemeral
> disks.
> Please trigger the experimental queue if your patches touch migrations
> in some way so we can check the stability of the jobs. Once stable and
> with sufficient tests we will promote the job from the experimental
> queue so that it always runs.
> See: https://review.openstack.org/#/q/topic:lm_test
> Improve API docs - done
> Some changes were made to the API guide for moving servers, including
> better descriptions for the server actions migrate, live migrate,
> shelve, resize and evacuate (
> http://developer.openstack.org/api-guide/compute/server_concepts.html#server-actions
> ) and a section that describes reasons for moving VMs with common use
> cases outlined (
> http://developer.openstack.org/api-guide/compute/server_concepts.html#moving-servers
> )
> Block live migration with attached volumes - done
> The selective block device migration API in libvirt 1.2.17 is used to
> allow block migration when volumes are attached. A follow on patch to
> allow readonly drives to be copied in block migration has not been
> completed. This patch is required to allow iso9600 format config
> drives to be migrated. Without it only vfat config drives can be
> migrated. There is still some thought going into that - see:
> https://review.openstack.org/#/c/234659
> Force complete - requires python-novaclient change
> Force-complete forces a live migration to complete  by pausing the VM
> and restarting it when it has completed migration. This is intended as
> a brute force way to make a VM complete its migration when it is
> taking too long. In the future auto-converge and post-copy will be
> looked at. These became available in qemu 2.5.
> Force complete is done in nova but still requires a change to
> python-novaclient to implement the CLI.
> Cancel - in progress
> Cancel stops a live migration, leaving it on the source host with the
> migration status left as "cancelled". This is in progress and follows
> the pattern of force-complete. Unfortunately this needs to be bundled
> up into one patch to avoid multiple API bumps.
> Patches for review:
> https://review.openstack.org/#/q/status:open+topic:bp/abort-live-migration
> Progress reporting - in progress (no pun intended)
> Progress reporting introduces migrations as a sub-resource of servers
> and adds progress data to the migration record. There was some debate
> at the mid cycle and on the mailing list about how to record this
> transient data. It is a waste to keep writing it to the database, but
> as it is generated at the compute manager but examined at the API it
> was felt that writing it to the database is necessary to fit the
> existing architecture. The conclusions was that writing to the
> database every 5 seconds would not cause a significant overhead.
> Alternatives could be persued later if necessary. For discussion see
> this ML thread:
> http://lists.openstack.org/pipermail/openstack-dev/2016-February/085662.html
> and the IRC meeting transcript here:
> http://eavesdrop.openstack.org/meetings/nova_live_migration/2016/nova_live_migration.2016-02-09-14.01.log.html
> Patches for review:
> https://review.openstack.org/#/q/status:open+topic:bp/live-migration-progress-report
> Split networking - done
> Split networking adds a configuration parameter to specify
> live_migration_inbound_addr as the ip address or host name to be used
> as the target for migration traffic. This allows migration traffic to
> be isolated on a separate network to other management traffic,
> providing an opportunity to islate service levels for the two networks
> and improve security by moving unencrypted migration traffic to an
> isolated network.
> Resize/cold migrate using storage pools - deferred
> The objective here was to change the libvirt implementation of migrate
> and resize to use libvirt storage pools instead of scp/rsync over ssh
> with passwordless keys. Storage pools are supported in all versions of
> libvrit supported by nova, so it was thought that by changing the
> implementation it would be possible to drop the ssh based code.
> However two flaws in this approach arose: the recently added ploop
> storage device does not work with storage pools in libvirt and the
> libvirt data copy implementation is very inefficient and so slower
> than scp or rsync.
> The guys at Parallels kindly agreed to implement storage pools support
> for ploop in libvirt and this work is already making progress. Work
> was also started in libvirt to improve the copy performance. These
> features will be available in a future release, so we will need to
> maintain old ssh-based migration for libvirt as well as refactor and
> implement the storage pools based alternative.
> Work has started on refactoring the libvirt driver code but the
> following blueprints will be deferred beyond mitaka:
> http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/use-libvirt-storage-pools.html
> http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/migrate-libvirt-volumes.html
> Deprecate migration flags - done
> There are a lot of migration flags used with libvirt that are either
> redundant or can be inferred from the deployed configuration. These
> are being deprecated and will be removed in the next cycle.
> See:
> https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:deprecate-migration-flags-config

This is a nice cleanup, as now I can stop traiging countless bugs or
comments on IRC about what flags one ought to set.
Thanks for the overall summary/update!

PS: If it's possible to make your email client wrap long sentences,
please do so, it's a little hard to read.  </me-stops-being-a-pest>


More information about the OpenStack-dev mailing list