[openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
Matt Riedemann
mriedem at us.ibm.com
Fri Oct 18 14:17:26 UTC 2013
I just opened this bug, it's going to be one of the blockers for us to get
PowerVM CI going in Icehouse:
https://bugs.launchpad.net/nova/+bug/1241619
Thanks,
MATT RIEDEMANN
Advisory Software Engineer
Cloud Solutions and OpenStack Development
Phone: 1-507-253-7622 | Mobile: 1-507-990-1889
E-mail: mriedem at us.ibm.com
3605 Hwy 52 N
Rochester, MN 55901-1407
United States
From: Matt Riedemann/Rochester/IBM at IBMUS
To: OpenStack Development Mailing List
<openstack-dev at lists.openstack.org>,
Date: 10/11/2013 10:59 AM
Subject: Re: [openstack-dev] [nova][powervm] my notes from the
meeting on powervm CI
Matthew Treinish <mtreinish at kortar.org> wrote on 10/10/2013 10:31:29 PM:
> From: Matthew Treinish <mtreinish at kortar.org>
> To: OpenStack Development Mailing List
<openstack-dev at lists.openstack.org>,
> Date: 10/10/2013 11:07 PM
> Subject: Re: [openstack-dev] [nova][powervm] my notes from the
> meeting on powervm CI
>
> On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote:
> > On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann <mriedem at us.ibm.com>
wrote:
> > > >
> > > > > 4. What is the max amount of time for us to report test results?
Dan
> > > > > didn't seem to think 48 hours would fly. :)
> > > >
> > > > Honestly, I think that 12 hours during peak times is the upper
limit of
> > > > what could be considered useful. If it's longer than that, many
patches
> > > > could go into the tree without a vote, which defeats the point.
> > >
> > > Yeah, I was just joking about the 48 hour thing, 12 hours seems
excessive
> > > but I guess that has happened when things are super backed up with
gate
> > > issues and rechecks.
> > >
> > > Right now things take about 4 hours, with Tempest being around 1.5
hours
> > > of that. The rest of the time is setup and install, which includes
heat
> > > and ceilometer. So I guess that raises another question, if we're
really
> > > setting this up right now because of nova, do we need to have heat
and
> > > ceilometer installed and configured in the initial delivery of this
if
> > > we're not going to run tempest tests against them (we don't right
now)?
> > >
> >
> >
> > In general the faster the better, and if things get to slow enough
that we
> > have to wait for powervm CI to report back, I
> > think its reasonable to go ahead and approve things without hearing
back.
> > In reality if you can report back in under 12 hours this will rarely
> > happen (I think).
> >
> >
> > >
> > > I think some aspect of the slow setup time is related to DB2 and how
> > > the migrations perform with some of that, but the overall time is
not
> > > considerably different from when we were running this with MySQL so
> > > I'm reluctant to blame it all on DB2. I think some of our topology
> > > could have something to do with it too since the IVM hypervisor is
running
> > > on a separate system and we are gated on how it's performing at any
> > > given time. I think that will be our biggest challenge for the
scale
> > > issues with community CI.
> > >
> > > >
> > > > > 5. What are the minimum tests that need to run (excluding
> APIs that the
> > > > > powervm driver doesn't currently support)?
> > > > > - smoke/gate/negative/whitebox/scenario/cli? Right
> now we have
> > > > > 1152 tempest tests running, those are only within
api/scenario/cli and
> > > > > we don't run everything.
>
> Well that's almost a full run right now, the full tempest jobs have 1290
tests
> of which we skip 65 because of bugs or configuration. (don't run neutron
api
> tests without neutron) That number is actually pretty high since you are
> running with neutron. Right now the neutron gating jobs only have 221
jobs and
> skip 8 of those. Can you share the list of things you've got working
with
> neutron so we can up the number of gating tests?
Here is the nose.cfg we run with:
Some of the tests are excluded because of performance issues that still
need to
be worked out (like test_list_image_filters - it works but it takes over
20
minutes sometimes).
Some of the tests are excluded because of limitations with DB2, e.g.
test_list_servers_filtered_by_name_wildcard
Some of them are probably old excludes on bugs that are now fixed. We have
to
go back through what's excluded every once in awhile to figure out what's
still broken and clean things up.
Here is the tempest.cfg we use on ppc64:
And here are the xunit results from our latest run:
Note that we have known issues with some cinder and neutron failures
in there.
>
> > > >
> > > > I think that "a full run of tempest" should be required. That
said, if
> > > > there are things that the driver legitimately doesn't support, it
makes
> > > > sense to exclude those from the tempest run, otherwise it's not
useful.
> > >
> >
> > ++
> >
> >
> >
> > > >
> > > > I think you should publish the tempest config (or config script,
or
> > > > patch, or whatever) that you're using so that we can see what it
means
> > > > in terms of the coverage you're providing.
> > >
> > > Just to clarify, do you mean publish what we are using now or
publish
> > > once it's all working? I can certainly attach our nose.cfg and
> > > latest x-unit results xml file.
> > >
> >
> > We should publish all logs, similar to what we do for upstream (
> > http://logs.openstack.org/96/48196/8/gate/gate-tempest-devstack-
> vm-full/70ae562/
> > ).
>
> Yes, and part of that is the devstack logs which shows all the
configuration
> steps for getting an environment up and running. This is sometimes very
useful
> for debugging. So this is probably information that you'll want to
> replicate in
> whatever the logging output for the powervm jobs ends up being.
Agreed. Our Jenkins job already pulls back our install log and all of the
service
logs which would be similar to what we have in community with devstack.
>
> > > >
> > > > > 6. Network service? We're running with openvswitch 1.10 today so
we
> > > > > probably want to continue with that if possible.
> > > >
> > > > Hmm, so that means neutron? AFAIK, not much of tempest runs with
> > > > Nova/Neutron.
> > > >
> > > > I kinda think that since nova-network is our default right now
(for
> > > > better or worse) that the run should include that mode, especially
if
> > > > using neutron excludes a large portion of the tests.
> > > >
> > > > I think you said you're actually running a bunch of tempest right
now,
> > > > which conflicts with my understanding of neutron workiness. Can
you
> > > clarify?
> > >
> > > Correct, we're running with neutron using the ovs plugin. We
> basically have
> > > the same issues that the neutron gate jobs have, which is related to
> > > concurrency
> > > issues and tenant isolation (we're doing the same as devstack with
neutron
> > > in that we don't run tempest with tenant isolation). We are running
most
> > > of the nova and most of the neutron API tests though (we don't have
all
> > > of the neutron-dependent scenario tests working though, probably
more due
> > > to incompetence in setting up neutron than anything else).
>
> I also agree with Dan here in the short term you should probably at
> least have a
> run with nova-network since it's the default. It'll also let you run
with
> tenant isolation which should allow you to run these jobs in parallel
which
> might help with your speed issues. (It depends on several things)
>
> There is only one neutron dependent scenario test that gets run in the
neutron
> gating jobs right now. So I'm sure that you're probably at least
> matching that.
I'd have to give nova-network a shot since we've been running with
openvswitch
since grizzly and then see how things look. I also haven't done a run with
Fedora 19 yet to see what difference using testr will make.
>
> > >
> > > >
> > > > > 7. Cinder backend? We're running with the storwize driver but we
do we
> > > > > do about the remote v7000?
> > > >
> > > > Is there any reason not to just run with a local LVM setup like we
do in
> > > > the real gate? I mean, additional coverage for the v7000 driver is
> > > > great, but if it breaks and causes you to not have any coverage at
all,
> > > > that seems, like, bad to me :)
> > >
> > > Yeah, I think we'd just run with a local LVM setup, that's what we
do for
> > > x86_64 and s390x tempest runs. For whatever reason we thought we'd
do
> > > storwize for our ppc64 runs, probably just to have a matrix of
coverage.
> > >
> > > >
> > > > > Again, just getting some thoughts out there to help us figure
out our
> > > > > goals for this, especially around 4 and 5.
> > > >
> > > > Yeah, thanks for starting this discussion!
> > > >
>
>
> -Matt Treinish
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> [attachment "nose-ppc64-havana.cfg" deleted by Matt
Riedemann/Rochester/IBM] [attachment "tempest.conf" deleted by Matt
Riedemann/Rochester/IBM] [attachment "nosetests.xml" deleted by Matt
Riedemann/Rochester/IBM] _______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131018/736b53eb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131018/736b53eb/attachment.gif>
More information about the OpenStack-dev
mailing list