[openstack-dev] [nova][powervm] my notes from the meeting on powervm CI

Matt Riedemann mriedem at us.ibm.com
Fri Oct 18 14:17:26 UTC 2013


I just opened this bug, it's going to be one of the blockers for us to get 
PowerVM CI going in Icehouse:

https://bugs.launchpad.net/nova/+bug/1241619 



Thanks,

MATT RIEDEMANN
Advisory Software Engineer
Cloud Solutions and OpenStack Development

Phone: 1-507-253-7622 | Mobile: 1-507-990-1889
E-mail: mriedem at us.ibm.com


3605 Hwy 52 N
Rochester, MN 55901-1407
United States




From:   Matt Riedemann/Rochester/IBM at IBMUS
To:     OpenStack Development Mailing List 
<openstack-dev at lists.openstack.org>, 
Date:   10/11/2013 10:59 AM
Subject:        Re: [openstack-dev] [nova][powervm] my notes from the 
meeting on      powervm CI







Matthew Treinish <mtreinish at kortar.org> wrote on 10/10/2013 10:31:29 PM:

> From: Matthew Treinish <mtreinish at kortar.org> 
> To: OpenStack Development Mailing List 
<openstack-dev at lists.openstack.org>, 
> Date: 10/10/2013 11:07 PM 
> Subject: Re: [openstack-dev] [nova][powervm] my notes from the 
> meeting on powervm CI 
> 
> On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote:
> > On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann <mriedem at us.ibm.com> 
wrote:
> > > >
> > > > > 4. What is the max amount of time for us to report test results? 
 Dan
> > > > > didn't seem to think 48 hours would fly. :)
> > > >
> > > > Honestly, I think that 12 hours during peak times is the upper 
limit of
> > > > what could be considered useful. If it's longer than that, many 
patches
> > > > could go into the tree without a vote, which defeats the point.
> > >
> > > Yeah, I was just joking about the 48 hour thing, 12 hours seems 
excessive
> > > but I guess that has happened when things are super backed up with 
gate
> > > issues and rechecks.
> > >
> > > Right now things take about 4 hours, with Tempest being around 1.5 
hours
> > > of that. The rest of the time is setup and install, which includes 
heat
> > > and ceilometer. So I guess that raises another question, if we're 
really
> > > setting this up right now because of nova, do we need to have heat 
and
> > > ceilometer installed and configured in the initial delivery of this 
if
> > > we're not going to run tempest tests against them (we don't right 
now)?
> > >
> > 
> > 
> > In general the faster the better, and if things get to slow enough 
that we
> > have to wait for powervm CI to report back, I
> > think its reasonable to go ahead and approve things without hearing 
back.
> >  In reality if you can report back in under 12 hours this will rarely
> > happen (I think).
> > 
> > 
> > >
> > > I think some aspect of the slow setup time is related to DB2 and how
> > > the migrations perform with some of that, but the overall time is 
not
> > > considerably different from when we were running this with MySQL so
> > > I'm reluctant to blame it all on DB2.  I think some of our topology
> > > could have something to do with it too since the IVM hypervisor is 
running
> > > on a separate system and we are gated on how it's performing at any
> > > given time.  I think that will be our biggest challenge for the 
scale
> > > issues with community CI.
> > >
> > > >
> > > > > 5. What are the minimum tests that need to run (excluding 
> APIs that the
> > > > > powervm driver doesn't currently support)?
> > > > >         - smoke/gate/negative/whitebox/scenario/cli?  Right 
> now we have
> > > > > 1152 tempest tests running, those are only within 
api/scenario/cli and
> > > > > we don't run everything.
> 
> Well that's almost a full run right now, the full tempest jobs have 1290 
tests
> of which we skip 65 because of bugs or configuration. (don't run neutron 
api
> tests without neutron) That number is actually pretty high since you are
> running with neutron. Right now the neutron gating jobs only have 221 
jobs and
> skip 8 of those. Can you share the list of things you've got working 
with
> neutron so we can up the number of gating tests? 

Here is the nose.cfg we run with: 



Some of the tests are excluded because of performance issues that still 
need to 
be worked out (like test_list_image_filters - it works but it takes over 
20 
minutes sometimes). 

Some of the tests are excluded because of limitations with DB2, e.g. 
test_list_servers_filtered_by_name_wildcard 

Some of them are probably old excludes on bugs that are now fixed. We have 
to 
go back through what's excluded every once in awhile to figure out what's 
still broken and clean things up. 

Here is the tempest.cfg we use on ppc64: 



And here are the xunit results from our latest run: 



Note that we have known issues with some cinder and neutron failures 
in there. 

> 
> > > >
> > > > I think that "a full run of tempest" should be required. That 
said, if
> > > > there are things that the driver legitimately doesn't support, it 
makes
> > > > sense to exclude those from the tempest run, otherwise it's not 
useful.
> > >
> > 
> > ++
> > 
> > 
> > 
> > >  >
> > > > I think you should publish the tempest config (or config script, 
or
> > > > patch, or whatever) that you're using so that we can see what it 
means
> > > > in terms of the coverage you're providing.
> > >
> > > Just to clarify, do you mean publish what we are using now or 
publish
> > > once it's all working?  I can certainly attach our nose.cfg and
> > > latest x-unit results xml file.
> > >
> > 
> > We should publish all logs, similar to what we do for upstream (
> > http://logs.openstack.org/96/48196/8/gate/gate-tempest-devstack-
> vm-full/70ae562/
> > ).
> 
> Yes, and part of that is the devstack logs which shows all the 
configuration
> steps for getting an environment up and running. This is sometimes very 
useful
> for debugging. So this is probably information that you'll want to 
> replicate in
> whatever the logging output for the powervm jobs ends up being. 

Agreed. Our Jenkins job already pulls back our install log and all of the 
service 
logs which would be similar to what we have in community with devstack. 

> 
> > > >
> > > > > 6. Network service? We're running with openvswitch 1.10 today so 
we
> > > > > probably want to continue with that if possible.
> > > >
> > > > Hmm, so that means neutron? AFAIK, not much of tempest runs with
> > > > Nova/Neutron.
> > > >
> > > > I kinda think that since nova-network is our default right now 
(for
> > > > better or worse) that the run should include that mode, especially 
if
> > > > using neutron excludes a large portion of the tests.
> > > >
> > > > I think you said you're actually running a bunch of tempest right 
now,
> > > > which conflicts with my understanding of neutron workiness. Can 
you
> > > clarify?
> > >
> > > Correct, we're running with neutron using the ovs plugin. We 
> basically have
> > > the same issues that the neutron gate jobs have, which is related to
> > > concurrency
> > > issues and tenant isolation (we're doing the same as devstack with 
neutron
> > > in that we don't run tempest with tenant isolation).  We are running 
most
> > > of the nova and most of the neutron API tests though (we don't have 
all
> > > of the neutron-dependent scenario tests working though, probably 
more due
> > > to incompetence in setting up neutron than anything else).
> 
> I also agree with Dan here in the short term you should probably at 
> least have a
> run with nova-network since it's the default. It'll also let you run 
with
> tenant isolation which should allow you to run these jobs in parallel 
which
> might help with your speed issues. (It depends on several things)
> 
> There is only one neutron dependent scenario test that gets run in the 
neutron
> gating jobs right now. So I'm sure that you're probably at least 
> matching that. 

I'd have to give nova-network a shot since we've been running with 
openvswitch 
since grizzly and then see how things look. I also haven't done a run with 

Fedora 19 yet to see what difference using testr will make. 

> 
> > >
> > > >
> > > > > 7. Cinder backend? We're running with the storwize driver but we 
do we
> > > > > do about the remote v7000?
> > > >
> > > > Is there any reason not to just run with a local LVM setup like we 
do in
> > > > the real gate? I mean, additional coverage for the v7000 driver is
> > > > great, but if it breaks and causes you to not have any coverage at 
all,
> > > > that seems, like, bad to me :)
> > >
> > > Yeah, I think we'd just run with a local LVM setup, that's what we 
do for
> > > x86_64 and s390x tempest runs. For whatever reason we thought we'd 
do
> > > storwize for our ppc64 runs, probably just to have a matrix of 
coverage.
> > >
> > > >
> > > > > Again, just getting some thoughts out there to help us figure 
out our
> > > > > goals for this, especially around 4 and 5.
> > > >
> > > > Yeah, thanks for starting this discussion!
> > > >
> 
> 
> -Matt Treinish
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> [attachment "nose-ppc64-havana.cfg" deleted by Matt 
Riedemann/Rochester/IBM] [attachment "tempest.conf" deleted by Matt 
Riedemann/Rochester/IBM] [attachment "nosetests.xml" deleted by Matt 
Riedemann/Rochester/IBM] _______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131018/736b53eb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131018/736b53eb/attachment.gif>


More information about the OpenStack-dev mailing list