[OpenStack-Infra] Status of check-tempest-dsvm-f20 job

Eoghan Glynn eglynn at redhat.com
Wed Jun 18 16:48:59 UTC 2014



> On 06/18/2014 05:45 AM, Eoghan Glynn wrote:
> > 
> > 
> >> On 06/18/2014 06:46 PM, Eoghan Glynn wrote:
> >>> If we were to use f20 more widely in the gate (not to entirely
> >>> supplant precise, more just to split the load more evenly) then
> >>> would the problem observed tend to naturally resolve itself?
> >>
> >> I would be happy to see that, having spent some time on the Fedora
> >> bring-up :) However I guess there is a chicken-egg problem with
> >> large-scale roll-out in that the platform isn't quite stable yet.
> >> We've hit some things that only really become apparent "in the wild";
> >> differences between Rackspace & HP images, issues running on Xen which
> >> we don't test much, the odd upstream bug requiring work-arounds [1],
> >> etc.
> > 
> > Fair point.
> >  
> >> It did seem that devstack changes was the best place to stabilze the
> >> job.  However, as is apparent, devstack changes often need to be
> >> pushed through quickly and that does not match well with a slightly
> >> unstable job.
> >>
> >> Having it experimental in devstack isn't much help in stabilizing.  If
> >> I trigger experimental builds for every devstack change it runs
> >> several other jobs too, so really I've just increased contention for
> >> limited resources by doing that.
> > 
> > Very true also.
> > 
> >> I say this *has* to be running for devstack eventually to stop the
> >> fairly frequent breakage of devstack on Fedora, which causes a lot of
> >> people wasted time often chasing the same bugs.
> > 
> > I agree, if we're committed to Fedora being a first class citizen (as
> > per TC distro policy, IIUC) then it's crucial that Fedora-specific
> > breakages are exposed quickly in the gate, as opposed to being seen
> > by developers for the first time in the wild whenever they happen to
> > refresh their devstack.
> > 
> >> But in the mean time, maybe suggestions for getting the Fedora job
> >> exposure somewhere else where it can brew and stabilize are a good
> >> idea.
> > 
> > Well, I would suggest the ceilometer/py27 unit test job as a first
> > candidate for such exposure.
> > 
> > The reason being that mongodb 2.4 is not available on precise, but
> > is on f20. As a result, the mongodb scenario tests are effectively
> > skipped in the ceilo/py27 units, which is clearly badness and needs
> > to be addressed.
> > 
> > Obviously this lack of coverage will resolve itself quickly once
> > the Trusty switchover occurs, but it seems like we can short-circuit
> > that process by simply switching to f20 right now.
> > 
> > I think the marconi jobs would be another good candidate, where
> > switching over to f20 now would add real value. The marconi tests
> > include some coverage against mongodb proper, but this is currently
> > disabled, as marconi requires mongodb version >= 2.2 (and precise
> > can only offer 2.0.4).
> > 
> >> We could make a special queue just for f20 that only triggers that
> >> job, if others like that idea.
> >>
> >> Otherwise, ceilometer maybe?  I made some WIP patches [2,3] for this
> >> already.  I think it's close, just deciding what tempest tests to
> >> match for the job in [2].
> > 
> > Thanks for that.
> > 
> > So my feeling is that at least the following would make sense to base
> > on f20:
> > 
> > 1. ceilometer/py27
> > 2. tempest variant with the ceilo DB configured as mongodb
> > 3. marconi/py27
> > 
> > Then a random selection of other p27 jobs could potentially be added
> > over time to bring f20 usage up to approximately the same breath
> > as precise.
> > 
> > Cheers,
> > Eoghan

Thanks for the rapid response Sean!
 
> Unit test nodes are a different image yet still. So this actually
> wouldn't make anything better, it would just also stall out ceilometer
> and marconi unit tests in the same scenario.

A-ha, ok, thanks for the clarification on that.

My thought was to make the zero-allocation-for-lightly-used-classes
problem go away by making f20 into a heavily-used node class, but
I guess just rebasing *only* the ceilometer & marconi py27 jobs onto
f20 would be insufficient to get f20 over the line in that regard?

And if I understood correctly, rolling directly to a wider selection
of py27 jobs rebased on f20 would be premature in the infra group's
eyes, because f20 hasn't yet proven itself in the gate.

So I guess I'd like to get a feel for where the bar is set on f20
being considered a valid candidate for a wider selection of gate
jobs?

e.g. running as an experimental platform for X amount of time?

>From a purely parochial ceilometer perspective, I'd love to see the
py27 job running on f20 sooner rather later, as the asymmetry between
the tests run in the gate and in individual developers' environments
is clearly badness.

I believe some of the marconi folks would also be enthused about
enabling mongodb-based testing in the short term. 

> I think the real issue is to come up with a fairer algorithm that
> prevents any node class from starving, even in the extreme case. And get
> that implemented and accepted in nodepool.

That would indeed be great. Do you think the simple randomization
idea mooted earlier on this thread would suffice there, or am I over-
simplifying?
 
> I do think devstack was the right starting point, because it fixes lots
> of issues we've had with us accidentally breaking fedora in devstack.
> We've yet to figure out how overall reliable fedora is going to be.

If there's anything more that the Red Hatters in the community can
do to expedite the process of establishing the reliability of f20,
please do let us know.

Thanks!
Eoghan

> 	-Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> 



More information about the OpenStack-Infra mailing list