[openstack-dev] [nova] CI for reliable live-migration

Kraminsky, Arkadiy arkadiy.kraminsky at hp.com
Thu Aug 27 21:14:05 UTC 2015


Hello,

I'm a new developer on the Openstack project and am in the process of creating live migration CI for HP's 3PAR and Lefthand backends. I noticed you guys are looking for someone to pick up Joe Gordon's change for volume backed live migration tests and we can sure use something like this. I can take a look into the change, and see what I can do. :)

Thanks,

Arkadiy Kraminsky
________________________________
From: Joe Gordon [joe.gordon0 at gmail.com]
Sent: Wednesday, August 26, 2015 9:26 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] CI for reliable live-migration



On Wed, Aug 26, 2015 at 8:18 AM, Matt Riedemann <mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>> wrote:


On 8/26/2015 3:21 AM, Timofei Durakov wrote:
Hello,

Here is the situation: nova has live-migration feature but doesn't have
ci job to cover it by functional tests, only
gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
block-migration only.
The problem here is, that live-migration could be different, depending
on how instance was booted(volume-backed/ephemeral), how environment is
configured(is shared instance directory(NFS, for example), or RBD used
to store ephemeral disk), or for example user don't have that and is
going to use --block-migrate flag. To claim that we have reliable
live-migration in nova, we should check it at least on envs with rbd or
nfs as more popular than envs without shared storages at all.
Here is the steps for that:

 1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
    block-migration testing purposes;

When we are ready to make multinode voting we should remove the equivalent single node job.


If it's been stable for awhile then I'd be OK with making it voting on nova changes, I agree it's important to have at least *something* that gates on multi-node testing for nova since we seem to break this a few times per release.

Last I checked it isn't as stable is single node yet: http://jogo.github.io/gate/multinode [0].  The data going into graphite is a bit noisy so this may be a red herring, but at the very least it needs to be investigated. When I was last looking into this there were at least two known bugs:

https://bugs.launchpad.net/nova/+bug/1445569
<https://bugs.launchpad.net/nova/+bug/1445569>
https://bugs.launchpad.net/nova/+bug/1462305


[0] http://graphite.openstack.org/graph/?from=-36hours&height=500&until=now&width=800&bgcolor=ffffff&fgcolor=000000&yMax=100&yMin=0&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)&title=Check%20Failure%20Rates%20(36%20hours)&_t=0.48646087432280183<http://graphite.openstack.org/graph/?from=-36hours&height=500&until=now&width=800&bgcolor=ffffff&fgcolor=000000&yMax=100&yMin=0&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)&title=Check%20Failure%20Rates%20(36%20hours)&_t=0.48646087432280183>


 2. contribute to tempest to cover volume-backed instances live-migration;

jogo has had a patch up for this for awhile:

https://review.openstack.org/#/c/165233/

Since it's not full time on openstack anymore I assume some help there in picking up the change would be appreciated.

yes please


 3. make another job with rbd for storing ephemerals, it also requires
    changing tempest config;

We already have a voting ceph job for nova - can we turn that into a multi-node testing job and run live migration with shared storage using that?

 4. make job with nfs for ephemerals.

Can't we use a multi-node ceph job (#3) for this?


These steps should help us to improve current situation with
live-migration.

--
Timofey.



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list