[openstack-dev] [Nova] Live Migration: Austin summit update

Murray, Paul (HP Cloud) pmurray at hpe.com
Sat Apr 30 16:43:47 UTC 2016


Thanks Matt, I meant to cover CI but clearly omitted it. 


> On 30 Apr 2016, at 02:35, Matt Riedemann <mriedem at linux.vnet.ibm.com> wrote:
> 
>> On 4/29/2016 5:32 PM, Murray, Paul (HP Cloud) wrote:
>> The following summarizes status of the main topics relating to live
>> migration after the Newton design summit. Please feel free to correct
>> any inaccuracies or add additional information.
>> 
>> 
>> 
>> Paul
>> 
>> 
>> 
>> -------------------------------------------------------------
>> 
>> 
>> 
>> Libvirt storage pools
>> 
>> 
>> 
>> The storage pools work has been selected as one of the project review
>> priorities for Newton.
>> 
>> (see https://etherpad.openstack.org/p/newton-nova-summit-priorities )
>> 
>> 
>> 
>> Continuation of the libvirt storage pools work was discussed in the live
>> migration session. The proposal has grown to include a refactor of the
>> existing libvirt driver instance storage code. Justification for this is
>> based on three factors:
>> 
>> 1.       The code needs to be refactored to use storage pools
>> 
>> 2.       The code is complicated and uses inspection, poor practice
>> 
>> 3.       During the investigation Matt Booth discovered two CVEs in the
>> code – suggesting further work is justified
>> 
>> 
>> 
>> So the proposal is now to follow three stages:
>> 
>> 1.       Refactor the instance storage code
>> 
>> 2.       Adapt to use storage pools for the instance storage
>> 
>> 3.       Use storage pools to drive resize/migration
> 
> We also talked about the need for some additional test coverage for the refactor work:
> 
> 1. A job that uses LVM on the experimental queue.
> 
> 2. ploop should be covered by the Virtuozzo Compute third party CI but we'll need to double-check the test coverage there (is it running the tests that hit the code paths being refactored). Note that they have their own blueprint for implementing resize for ploop:
> 
> https://blueprints.launchpad.net/nova/+spec/virtuozzo-instance-resize-support
> 
> 3. Ceph testing - we already have a single-node job for Ceph that will test the resize paths. We should also be testing Ceph-backed live migration in the special live-migration job that Timofey has been working on.
> 
> 4. NFS testing - this also falls into the special live migration CI job that will test live migration in different storage configurations within a single run.
> 
>> 
>> 
>> 
>> Matt has code already starting the refactor and will continue with help
>> from Paul Carlton + Paul Murray. We will look for additional
>> contributors to help as we plan out the patches.
>> 
>> 
>> 
>> https://review.openstack.org/#/c/302117 : Persist libvirt instance
>> storage metadata
>> 
>> https://review.openstack.org/#/c/310505 : Use libvirt storage pools
>> 
>> https://review.openstack.org/#/c/310538 : Migrate libvirt volumes
>> 
>> 
>> 
>> Post copy
>> 
>> 
>> 
>> The spec to add post copy migration support in the libvirt driver was
>> discussed in the live migration session. Post copy guarantees completion
>> of a migration in linear time without needing to pause the VM. This can
>> be used as an alternative to pausing in live-migration-force-complete.
>> Pause or complete could also be invoked automatically under some
>> circumstances. The issue slowing these specs is how to decide which
>> method to use given they provide a different user experience but we
>> don’t want to expose virt specific features in the API. Two additional
>> specs listed below suggest possible generic ways to address the issue.
>> 
>> 
>> 
>> There was no conclusions reached in the session so the debate will
>> continue on the specs. The first below is the main spec for the feature.
>> 
>> 
>> 
>> https://review.openstack.org/#/c/301509 : Adds post-copy live migration
>> support to Nova
>> 
>> https://review.openstack.org/#/c/305425 : Define instance availability
>> profiles
>> 
>> https://review.openstack.org/#/c/306561 : Automatic Live Migration
>> Completion
>> 
>> 
>> 
>> Live Migration orchestrated via conductor
>> 
>> 
>> 
>> The proposal to move orchestration of live migration to conductor was
>> discussed in the working session on Friday, presented by Andrew Laski on
>> behalf of Timofey Durakov. This one threw up a lot of debate both for
>> and against the general idea, but not supporting the patches that have
>> been submitted along with the spec so far. The general feeling was that
>> we need to attack this, but need to take some simple first cleanup steps
>> first to get a better idea of the problem. Dan Smith proposed moving the
>> stateless pre-migration steps to a sequence of calls from conductor (as
>> opposed to the going back and forth between computes) as the first step.
>> 
>> 
>> 
>> https://review.openstack.org/#/c/292271 : Remove compute-compute
>> communication in live-migration
>> 
>> 
>> 
>> Cold and Live Migration Scheduling
>> 
>> 
>> 
>> When this patch merges all migrations will use the request spec for
>> scheduling: https://review.openstack.org/#/c/284974
>> 
>> Work is still ongoing for check destinations (allowing the scheduler to
>> check a destination chosen by the admin). When that is complete
>> migrations will have three ways to be placed:
>> 
>> 1.       Destination chosen by scheduler
>> 
>> 2.       Destination chosen by admin but checked by scheduler
>> 
>> 3.       Destination forced by admin
>> 
>> 
>> 
>> https://review.openstack.org/#/c/296408 Re-Proposes to check destination
>> on migrations
>> 
>> 
>> 
>> PCI + NUMA claims
>> 
>> 
>> 
>> Moshe and Jay are making great progress refactoring Nicola’s patches to
>> fix PCI and NUMA handling in migrations. The patch series should be
>> completed soon.
> 
> The patch series for that is here (dependent on some cleanups from Jay and the top patch needs to be rebased):
> 
> https://review.openstack.org/#/c/307124/
> 
> It would be great if we could test this with some NFV CI but from the notes in the session it sounds like we need a multi-node job for this?
> 

There were also comments in the Cinder-Nova session requesting that live migration tests should be used to test cinder back ends in external CI. We need to makes sure we have something suitable.


>> 
>> 
>> 
>> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> Thanks for the great write-up Paul, you've saved me some time. :) And thanks to the whole sub-team working on this for keeping up the focus.
> 
> -- 
> 
> Thanks,
> 
> Matt Riedemann
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list