[OpenStack-Infra] Status of check-tempest-dsvm-f20 job

Clark Boylan clark.boylan at gmail.com
Wed Jun 18 22:10:14 UTC 2014


On Wed, Jun 18, 2014 at 2:44 PM, Eoghan Glynn <eglynn at redhat.com> wrote:
>
>
>> >> >>> If we were to use f20 more widely in the gate (not to entirely
>> >> >>> supplant precise, more just to split the load more evenly) then
>> >> >>> would the problem observed tend to naturally resolve itself?
>> >> >>
>> >> >> I would be happy to see that, having spent some time on the Fedora
>> >> >> bring-up :) However I guess there is a chicken-egg problem with
>> >> >> large-scale roll-out in that the platform isn't quite stable yet.
>> >> >> We've hit some things that only really become apparent "in the wild";
>> >> >> differences between Rackspace & HP images, issues running on Xen which
>> >> >> we don't test much, the odd upstream bug requiring work-arounds [1],
>> >> >> etc.
>> >> >
>> >> > Fair point.
>> >> >
>> >> >> It did seem that devstack changes was the best place to stabilze the
>> >> >> job.  However, as is apparent, devstack changes often need to be
>> >> >> pushed through quickly and that does not match well with a slightly
>> >> >> unstable job.
>> >> >>
>> >> >> Having it experimental in devstack isn't much help in stabilizing.  If
>> >> >> I trigger experimental builds for every devstack change it runs
>> >> >> several other jobs too, so really I've just increased contention for
>> >> >> limited resources by doing that.
>> >> >
>> >> > Very true also.
>> >> >
>> >> >> I say this *has* to be running for devstack eventually to stop the
>> >> >> fairly frequent breakage of devstack on Fedora, which causes a lot of
>> >> >> people wasted time often chasing the same bugs.
>> >> >
>> >> > I agree, if we're committed to Fedora being a first class citizen (as
>> >> > per TC distro policy, IIUC) then it's crucial that Fedora-specific
>> >> > breakages are exposed quickly in the gate, as opposed to being seen
>> >> > by developers for the first time in the wild whenever they happen to
>> >> > refresh their devstack.
>> >> >
>> >> >> But in the mean time, maybe suggestions for getting the Fedora job
>> >> >> exposure somewhere else where it can brew and stabilize are a good
>> >> >> idea.
>> >> >
>> >> > Well, I would suggest the ceilometer/py27 unit test job as a first
>> >> > candidate for such exposure.
>> >> >
>> >> > The reason being that mongodb 2.4 is not available on precise, but
>> >> > is on f20. As a result, the mongodb scenario tests are effectively
>> >> > skipped in the ceilo/py27 units, which is clearly badness and needs
>> >> > to be addressed.
>> >> >
>> >> > Obviously this lack of coverage will resolve itself quickly once
>> >> > the Trusty switchover occurs, but it seems like we can short-circuit
>> >> > that process by simply switching to f20 right now.
>> >> >
>> >> > I think the marconi jobs would be another good candidate, where
>> >> > switching over to f20 now would add real value. The marconi tests
>> >> > include some coverage against mongodb proper, but this is currently
>> >> > disabled, as marconi requires mongodb version >= 2.2 (and precise
>> >> > can only offer 2.0.4).
>> >> >
>> >> >> We could make a special queue just for f20 that only triggers that
>> >> >> job, if others like that idea.
>> >> >>
>> >> >> Otherwise, ceilometer maybe?  I made some WIP patches [2,3] for this
>> >> >> already.  I think it's close, just deciding what tempest tests to
>> >> >> match for the job in [2].
>> >> >
>> >> > Thanks for that.
>> >> >
>> >> > So my feeling is that at least the following would make sense to base
>> >> > on f20:
>> >> >
>> >> > 1. ceilometer/py27
>> >> > 2. tempest variant with the ceilo DB configured as mongodb
>> >> > 3. marconi/py27
>> >> >
>> >> > Then a random selection of other p27 jobs could potentially be added
>> >> > over time to bring f20 usage up to approximately the same breath
>> >> > as precise.
>> >> >
>> >> > Cheers,
>> >> > Eoghan
>> >
>> > Thanks for the rapid response Sean!
>> >
>> >> Unit test nodes are a different image yet still. So this actually
>> >> wouldn't make anything better, it would just also stall out ceilometer
>> >> and marconi unit tests in the same scenario.
>> >
>> > A-ha, ok, thanks for the clarification on that.
>> >
>> > My thought was to make the zero-allocation-for-lightly-used-classes
>> > problem go away by making f20 into a heavily-used node class, but
>> > I guess just rebasing *only* the ceilometer & marconi py27 jobs onto
>> > f20 would be insufficient to get f20 over the line in that regard?
>> >
>> > And if I understood correctly, rolling directly to a wider selection
>> > of py27 jobs rebased on f20 would be premature in the infra group's
>> > eyes, because f20 hasn't yet proven itself in the gate.
>> >
>> > So I guess I'd like to get a feel for where the bar is set on f20
>> > being considered a valid candidate for a wider selection of gate
>> > jobs?
>> >
>> > e.g. running as an experimental platform for X amount of time?
>> >
>> > From a purely parochial ceilometer perspective, I'd love to see the
>> > py27 job running on f20 sooner rather later, as the asymmetry between
>> > the tests run in the gate and in individual developers' environments
>> > is clearly badness.
>> >
>
> Thanks for the response Clark, much appreciated.
>
>> As one of the infra folk tasked with making things like this happen I
>> am a very big -1 on this, -1.9 :) The issue is Fedora (and non LTS
>> Ubuntu releases) are supported for far less time than we support
>> stable branches. So what do we do after the ~13 months of Fedora
>> release support and ~9 months of Ubuntu release support? Do we port
>> the tests forward for stable branches? We have discussed this and that
>> is something we will not do. Instead we need releases with >=18 months
>> of support so that we can continue to run the tests on the distro
>> releases they were originally tested on. This is why we test on CentOS
>> and Ubuntu LTS.
>
> This is a very interesting point, and I'm open to admitting that my
> knowledge of this area is starting from a low enough base.
>
> I've been working with an assumption that the TC distro policy:
>
>  "OpenStack will target its development efforts to latest Ubuntu/Fedora,
>   but will not introduce any change that would make it impossible to run
>   on the latest Ubuntu LTS or latest RHEL."
>
> means that CI, as well as code-crafting efforts, should be focused on latest
> Ubuntu/Fedora (as most folks in the community rightly view CI green-ness
> as an integral part of the development workflow).
>
> But, by the same token, I'm realistic about the day-to-day pressures
> that you Trojans of openstack qa/infra are under, and TBH I hate quoting
> the highfalutin' policies handed down by the TC.
>
> So I'd really like to understand the practical blockers here, and towards
> that end I'll throw out a couple dumb questions:
>
> * would it be outrageous to have a different CI distro target for
>   master versus stable/most-recent-release-minus-1?
>
Not at all, we will go through this as we transition to Trusty and CentOS7.
>
> * would it be less outrageous to have a different CI distro target
>   for master *and* stable/most-recent-release-minus-1 versus
>   stable/most-recent-release-minus-{2|3|..}?
>
Again, this shouldn't be a problem.
>
> What I'm getting at is the notion that we could have both master and
> stable/most-recent-release-minus-1 running on Fedora-latest, while
> the bitrot jobs run against the corresponding LTS release.
>
This is problematic. The issue is when you gate something for 12
months on a distro the effort involved in porting it to something else
is non trivial. Especially in your example below of LTS not having new
enough packages you end up with items that were previously tested but
now may not be tested when you need to get that security fix out the
door. This is less than ideal.

We want every release to be tested on the same distro releases for the
lifetime of that openstack release. For Icehouse this is Ubuntu
Precise and CentOS6. We will probably move Juno to Trusty and CentOS7
but Juno will live on those distro releases for 18 months.
>
> Apologies in advance if I've horribly over-simplified the issues in
> play here.
>
>> The f20 tempest job is a little special and those involved have
>> basically agreed that we will just stop testing f20 when f20 is no
>> longer supported and not replace that job on stable branches. This is
>> ok because we run enough tempest tests on other distros that dropping
>> this test should be reasonably safe for the stable branches. This is
>> not true of the python 2.7 unittests jobs if we switch them to Fedora.
>> After 13 months what do we do? Also, there is probably minimal value
>> in testing Fedora vs Ubuntu LTS vs CentOS with the unittests as they
>> run in relatively isolated virtualenvs anyways.
>
> In general, I'd agree with you WRT the low value of unitests running
> against latest versus LTS, *except* in the case where those tests are
> hobbled the lack of some crucial dependency on LTS (e.g. mongodb>=2.4
> for ceilo & marconi).
>
We are working on Trusty honest (I just got nodepool to build our
first Trusty images), and that solves this problem. There is a bigger
underlying issue in that projects should not depend on things like
mongodb versions which are not available on the current Ubuntu LTS...
pretty sure that is what the TC meant by not breaking the LTS
releases. But that is worth its own thread :)
>
> Cheers,
> Eoghan
>
>
>> Though I am not a strong -2 on this subject, I think the current
>> stance is sane given the resources available to us and for unittests
>> there is very little reason to incur the overhead of doing it
>> differently.
>> >
>> > I believe some of the marconi folks would also be enthused about
>> > enabling mongodb-based testing in the short term.
>> >
>> >> I think the real issue is to come up with a fairer algorithm that
>> >> prevents any node class from starving, even in the extreme case. And get
>> >> that implemented and accepted in nodepool.
>> >
>> > That would indeed be great. Do you think the simple randomization
>> > idea mooted earlier on this thread would suffice there, or am I over-
>> > simplifying?
>> >
>> >> I do think devstack was the right starting point, because it fixes lots
>> >> of issues we've had with us accidentally breaking fedora in devstack.
>> >> We've yet to figure out how overall reliable fedora is going to be.
>> >
>> > If there's anything more that the Red Hatters in the community can
>> > do to expedite the process of establishing the reliability of f20,
>> > please do let us know.
>> >
>> > Thanks!
>> > Eoghan
>> >
>> >>       -Sean
>> >>
>> >> --
>> >> Sean Dague
>> >> http://dague.net
>> >>
>> >>
>> >
>> > _______________________________________________
>> > OpenStack-Infra mailing list
>> > OpenStack-Infra at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>> Clark
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>

Clark



More information about the OpenStack-Infra mailing list