Open Stack

Thu Jun 19 05:28:25 UTC 2014

----- Original Message -----
> On Wed, Jun 18, 2014 at 2:44 PM, Eoghan Glynn <eglynn at redhat.com> wrote:
> >
> >
> >> >> >>> If we were to use f20 more widely in the gate (not to entirely
> >> >> >>> supplant precise, more just to split the load more evenly) then
> >> >> >>> would the problem observed tend to naturally resolve itself?
> >> >> >>
> >> >> >> I would be happy to see that, having spent some time on the Fedora
> >> >> >> bring-up :) However I guess there is a chicken-egg problem with
> >> >> >> large-scale roll-out in that the platform isn't quite stable yet.
> >> >> >> We've hit some things that only really become apparent "in the
> >> >> >> wild";
> >> >> >> differences between Rackspace & HP images, issues running on Xen
> >> >> >> which
> >> >> >> we don't test much, the odd upstream bug requiring work-arounds [1],
> >> >> >> etc.
> >> >> >
> >> >> > Fair point.
> >> >> >
> >> >> >> It did seem that devstack changes was the best place to stabilze the
> >> >> >> job.  However, as is apparent, devstack changes often need to be
> >> >> >> pushed through quickly and that does not match well with a slightly
> >> >> >> unstable job.
> >> >> >>
> >> >> >> Having it experimental in devstack isn't much help in stabilizing.
> >> >> >> If
> >> >> >> I trigger experimental builds for every devstack change it runs
> >> >> >> several other jobs too, so really I've just increased contention for
> >> >> >> limited resources by doing that.
> >> >> >
> >> >> > Very true also.
> >> >> >
> >> >> >> I say this *has* to be running for devstack eventually to stop the
> >> >> >> fairly frequent breakage of devstack on Fedora, which causes a lot
> >> >> >> of
> >> >> >> people wasted time often chasing the same bugs.
> >> >> >
> >> >> > I agree, if we're committed to Fedora being a first class citizen (as
> >> >> > per TC distro policy, IIUC) then it's crucial that Fedora-specific
> >> >> > breakages are exposed quickly in the gate, as opposed to being seen
> >> >> > by developers for the first time in the wild whenever they happen to
> >> >> > refresh their devstack.
> >> >> >
> >> >> >> But in the mean time, maybe suggestions for getting the Fedora job
> >> >> >> exposure somewhere else where it can brew and stabilize are a good
> >> >> >> idea.
> >> >> >
> >> >> > Well, I would suggest the ceilometer/py27 unit test job as a first
> >> >> > candidate for such exposure.
> >> >> >
> >> >> > The reason being that mongodb 2.4 is not available on precise, but
> >> >> > is on f20. As a result, the mongodb scenario tests are effectively
> >> >> > skipped in the ceilo/py27 units, which is clearly badness and needs
> >> >> > to be addressed.
> >> >> >
> >> >> > Obviously this lack of coverage will resolve itself quickly once
> >> >> > the Trusty switchover occurs, but it seems like we can short-circuit
> >> >> > that process by simply switching to f20 right now.
> >> >> >
> >> >> > I think the marconi jobs would be another good candidate, where
> >> >> > switching over to f20 now would add real value. The marconi tests
> >> >> > include some coverage against mongodb proper, but this is currently
> >> >> > disabled, as marconi requires mongodb version >= 2.2 (and precise
> >> >> > can only offer 2.0.4).
> >> >> >
> >> >> >> We could make a special queue just for f20 that only triggers that
> >> >> >> job, if others like that idea.
> >> >> >>
> >> >> >> Otherwise, ceilometer maybe?  I made some WIP patches [2,3] for this
> >> >> >> already.  I think it's close, just deciding what tempest tests to
> >> >> >> match for the job in [2].
> >> >> >
> >> >> > Thanks for that.
> >> >> >
> >> >> > So my feeling is that at least the following would make sense to base
> >> >> > on f20:
> >> >> >
> >> >> > 1. ceilometer/py27
> >> >> > 2. tempest variant with the ceilo DB configured as mongodb
> >> >> > 3. marconi/py27
> >> >> >
> >> >> > Then a random selection of other p27 jobs could potentially be added
> >> >> > over time to bring f20 usage up to approximately the same breath
> >> >> > as precise.
> >> >> >
> >> >> > Cheers,
> >> >> > Eoghan
> >> >
> >> > Thanks for the rapid response Sean!
> >> >
> >> >> Unit test nodes are a different image yet still. So this actually
> >> >> wouldn't make anything better, it would just also stall out ceilometer
> >> >> and marconi unit tests in the same scenario.
> >> >
> >> > A-ha, ok, thanks for the clarification on that.
> >> >
> >> > My thought was to make the zero-allocation-for-lightly-used-classes
> >> > problem go away by making f20 into a heavily-used node class, but
> >> > I guess just rebasing *only* the ceilometer & marconi py27 jobs onto
> >> > f20 would be insufficient to get f20 over the line in that regard?
> >> >
> >> > And if I understood correctly, rolling directly to a wider selection
> >> > of py27 jobs rebased on f20 would be premature in the infra group's
> >> > eyes, because f20 hasn't yet proven itself in the gate.
> >> >
> >> > So I guess I'd like to get a feel for where the bar is set on f20
> >> > being considered a valid candidate for a wider selection of gate
> >> > jobs?
> >> >
> >> > e.g. running as an experimental platform for X amount of time?
> >> >
> >> > From a purely parochial ceilometer perspective, I'd love to see the
> >> > py27 job running on f20 sooner rather later, as the asymmetry between
> >> > the tests run in the gate and in individual developers' environments
> >> > is clearly badness.
> >> >
> >
> > Thanks for the response Clark, much appreciated.
> >
> >> As one of the infra folk tasked with making things like this happen I
> >> am a very big -1 on this, -1.9 :) The issue is Fedora (and non LTS
> >> Ubuntu releases) are supported for far less time than we support
> >> stable branches. So what do we do after the ~13 months of Fedora
> >> release support and ~9 months of Ubuntu release support? Do we port
> >> the tests forward for stable branches? We have discussed this and that
> >> is something we will not do. Instead we need releases with >=18 months
> >> of support so that we can continue to run the tests on the distro
> >> releases they were originally tested on. This is why we test on CentOS
> >> and Ubuntu LTS.
> >
> > This is a very interesting point, and I'm open to admitting that my
> > knowledge of this area is starting from a low enough base.
> >
> > I've been working with an assumption that the TC distro policy:
> >
> >  "OpenStack will target its development efforts to latest Ubuntu/Fedora,
> >   but will not introduce any change that would make it impossible to run
> >   on the latest Ubuntu LTS or latest RHEL."
> >
> > means that CI, as well as code-crafting efforts, should be focused on
> > latest
> > Ubuntu/Fedora (as most folks in the community rightly view CI green-ness
> > as an integral part of the development workflow).
> >
> > But, by the same token, I'm realistic about the day-to-day pressures
> > that you Trojans of openstack qa/infra are under, and TBH I hate quoting
> > the highfalutin' policies handed down by the TC.
> >
> > So I'd really like to understand the practical blockers here, and towards
> > that end I'll throw out a couple dumb questions:
> >
> > * would it be outrageous to have a different CI distro target for
> >   master versus stable/most-recent-release-minus-1?
> >
> Not at all, we will go through this as we transition to Trusty and CentOS7.

Got it.

> > * would it be less outrageous to have a different CI distro target
> >   for master *and* stable/most-recent-release-minus-1 versus
> >   stable/most-recent-release-minus-{2|3|..}?
> >
> Again, this shouldn't be a problem.

Cool.

> > What I'm getting at is the notion that we could have both master and
> > stable/most-recent-release-minus-1 running on Fedora-latest, while
> > the bitrot jobs run against the corresponding LTS release.
> >
> This is problematic. The issue is when you gate something for 12
> months on a distro the effort involved in porting it to something else
> is non trivial. Especially in your example below of LTS not having new
> enough packages you end up with items that were previously tested but
> now may not be tested when you need to get that security fix out the
> door. This is less than ideal.

Ah yes, I see that now.

> We want every release to be tested on the same distro releases for the
> lifetime of that openstack release. For Icehouse this is Ubuntu
> Precise and CentOS6. We will probably move Juno to Trusty and CentOS7
> but Juno will live on those distro releases for 18 months.

Just to clarify, did you mean above that *Icehouse* will live on those
distro releases (i.e. Precise and CentOS6) for 18 months?

> > Apologies in advance if I've horribly over-simplified the issues in
> > play here.
> >
> >> The f20 tempest job is a little special and those involved have
> >> basically agreed that we will just stop testing f20 when f20 is no
> >> longer supported and not replace that job on stable branches. This is
> >> ok because we run enough tempest tests on other distros that dropping
> >> this test should be reasonably safe for the stable branches. This is
> >> not true of the python 2.7 unittests jobs if we switch them to Fedora.
> >> After 13 months what do we do? Also, there is probably minimal value
> >> in testing Fedora vs Ubuntu LTS vs CentOS with the unittests as they
> >> run in relatively isolated virtualenvs anyways.
> >
> > In general, I'd agree with you WRT the low value of unitests running
> > against latest versus LTS, *except* in the case where those tests are
> > hobbled the lack of some crucial dependency on LTS (e.g. mongodb>=2.4
> > for ceilo & marconi).
> >
> We are working on Trusty honest (I just got nodepool to build our
> first Trusty images), and that solves this problem.

Great to hear of this progress :)

And the $64k question ... is the Trusty switch-over still looking
like a "probably" or more of a "definite" for say juno-2?

Thanks again for gently walking me through the basic background
here.

Cheers,
Eoghan

> There is a bigger
> underlying issue in that projects should not depend on things like
> mongodb versions which are not available on the current Ubuntu LTS...
> pretty sure that is what the TC meant by not breaking the LTS
> releases. But that is worth its own thread :)
> >
> > Cheers,
> > Eoghan
> >
> >
> >> Though I am not a strong -2 on this subject, I think the current
> >> stance is sane given the resources available to us and for unittests
> >> there is very little reason to incur the overhead of doing it
> >> differently.
> >> >
> >> > I believe some of the marconi folks would also be enthused about
> >> > enabling mongodb-based testing in the short term.
> >> >
> >> >> I think the real issue is to come up with a fairer algorithm that
> >> >> prevents any node class from starving, even in the extreme case. And
> >> >> get
> >> >> that implemented and accepted in nodepool.
> >> >
> >> > That would indeed be great. Do you think the simple randomization
> >> > idea mooted earlier on this thread would suffice there, or am I over-
> >> > simplifying?
> >> >
> >> >> I do think devstack was the right starting point, because it fixes lots
> >> >> of issues we've had with us accidentally breaking fedora in devstack.
> >> >> We've yet to figure out how overall reliable fedora is going to be.
> >> >
> >> > If there's anything more that the Red Hatters in the community can
> >> > do to expedite the process of establishing the reliability of f20,
> >> > please do let us know.
> >> >
> >> > Thanks!
> >> > Eoghan
> >> >
> >> >>       -Sean
> >> >>
> >> >> --
> >> >> Sean Dague
> >> >> http://dague.net
> >> >>
> >> >>
> >> >
> >> > _______________________________________________
> >> > OpenStack-Infra mailing list
> >> > OpenStack-Infra at lists.openstack.org
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> >>
> >> Clark
> >>
> >> _______________________________________________
> >> OpenStack-Infra mailing list
> >> OpenStack-Infra at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> >>
> 
> Clark
> 

Open Stack

[OpenStack-Infra] Status of check-tempest-dsvm-f20 job

OpenStack

Community

Documentation

Branding & Legal