[OpenStack-Infra] Status of check-tempest-dsvm-f20 job

Clark Boylan clark.boylan at gmail.com
Wed Jun 18 21:00:52 UTC 2014


On Wed, Jun 18, 2014 at 9:48 AM, Eoghan Glynn <eglynn at redhat.com> wrote:
>
>
>> On 06/18/2014 05:45 AM, Eoghan Glynn wrote:
>> >
>> >
>> >> On 06/18/2014 06:46 PM, Eoghan Glynn wrote:
>> >>> If we were to use f20 more widely in the gate (not to entirely
>> >>> supplant precise, more just to split the load more evenly) then
>> >>> would the problem observed tend to naturally resolve itself?
>> >>
>> >> I would be happy to see that, having spent some time on the Fedora
>> >> bring-up :) However I guess there is a chicken-egg problem with
>> >> large-scale roll-out in that the platform isn't quite stable yet.
>> >> We've hit some things that only really become apparent "in the wild";
>> >> differences between Rackspace & HP images, issues running on Xen which
>> >> we don't test much, the odd upstream bug requiring work-arounds [1],
>> >> etc.
>> >
>> > Fair point.
>> >
>> >> It did seem that devstack changes was the best place to stabilze the
>> >> job.  However, as is apparent, devstack changes often need to be
>> >> pushed through quickly and that does not match well with a slightly
>> >> unstable job.
>> >>
>> >> Having it experimental in devstack isn't much help in stabilizing.  If
>> >> I trigger experimental builds for every devstack change it runs
>> >> several other jobs too, so really I've just increased contention for
>> >> limited resources by doing that.
>> >
>> > Very true also.
>> >
>> >> I say this *has* to be running for devstack eventually to stop the
>> >> fairly frequent breakage of devstack on Fedora, which causes a lot of
>> >> people wasted time often chasing the same bugs.
>> >
>> > I agree, if we're committed to Fedora being a first class citizen (as
>> > per TC distro policy, IIUC) then it's crucial that Fedora-specific
>> > breakages are exposed quickly in the gate, as opposed to being seen
>> > by developers for the first time in the wild whenever they happen to
>> > refresh their devstack.
>> >
>> >> But in the mean time, maybe suggestions for getting the Fedora job
>> >> exposure somewhere else where it can brew and stabilize are a good
>> >> idea.
>> >
>> > Well, I would suggest the ceilometer/py27 unit test job as a first
>> > candidate for such exposure.
>> >
>> > The reason being that mongodb 2.4 is not available on precise, but
>> > is on f20. As a result, the mongodb scenario tests are effectively
>> > skipped in the ceilo/py27 units, which is clearly badness and needs
>> > to be addressed.
>> >
>> > Obviously this lack of coverage will resolve itself quickly once
>> > the Trusty switchover occurs, but it seems like we can short-circuit
>> > that process by simply switching to f20 right now.
>> >
>> > I think the marconi jobs would be another good candidate, where
>> > switching over to f20 now would add real value. The marconi tests
>> > include some coverage against mongodb proper, but this is currently
>> > disabled, as marconi requires mongodb version >= 2.2 (and precise
>> > can only offer 2.0.4).
>> >
>> >> We could make a special queue just for f20 that only triggers that
>> >> job, if others like that idea.
>> >>
>> >> Otherwise, ceilometer maybe?  I made some WIP patches [2,3] for this
>> >> already.  I think it's close, just deciding what tempest tests to
>> >> match for the job in [2].
>> >
>> > Thanks for that.
>> >
>> > So my feeling is that at least the following would make sense to base
>> > on f20:
>> >
>> > 1. ceilometer/py27
>> > 2. tempest variant with the ceilo DB configured as mongodb
>> > 3. marconi/py27
>> >
>> > Then a random selection of other p27 jobs could potentially be added
>> > over time to bring f20 usage up to approximately the same breath
>> > as precise.
>> >
>> > Cheers,
>> > Eoghan
>
> Thanks for the rapid response Sean!
>
>> Unit test nodes are a different image yet still. So this actually
>> wouldn't make anything better, it would just also stall out ceilometer
>> and marconi unit tests in the same scenario.
>
> A-ha, ok, thanks for the clarification on that.
>
> My thought was to make the zero-allocation-for-lightly-used-classes
> problem go away by making f20 into a heavily-used node class, but
> I guess just rebasing *only* the ceilometer & marconi py27 jobs onto
> f20 would be insufficient to get f20 over the line in that regard?
>
> And if I understood correctly, rolling directly to a wider selection
> of py27 jobs rebased on f20 would be premature in the infra group's
> eyes, because f20 hasn't yet proven itself in the gate.
>
> So I guess I'd like to get a feel for where the bar is set on f20
> being considered a valid candidate for a wider selection of gate
> jobs?
>
> e.g. running as an experimental platform for X amount of time?
>
> From a purely parochial ceilometer perspective, I'd love to see the
> py27 job running on f20 sooner rather later, as the asymmetry between
> the tests run in the gate and in individual developers' environments
> is clearly badness.
>
As one of the infra folk tasked with making things like this happen I
am a very big -1 on this, -1.9 :) The issue is Fedora (and non LTS
Ubuntu releases) are supported for far less time than we support
stable branches. So what do we do after the ~13 months of Fedora
release support and ~9 months of Ubuntu release support? Do we port
the tests forward for stable branches? We have discussed this and that
is something we will not do. Instead we need releases with >=18 months
of support so that we can continue to run the tests on the distro
releases they were originally tested on. This is why we test on CentOS
and Ubuntu LTS.

The f20 tempest job is a little special and those involved have
basically agreed that we will just stop testing f20 when f20 is no
longer supported and not replace that job on stable branches. This is
ok because we run enough tempest tests on other distros that dropping
this test should be reasonably safe for the stable branches. This is
not true of the python 2.7 unittests jobs if we switch them to Fedora.
After 13 months what do we do? Also, there is probably minimal value
in testing Fedora vs Ubuntu LTS vs CentOS with the unittests as they
run in relatively isolated virtualenvs anyways.

Though I am not a strong -2 on this subject, I think the current
stance is sane given the resources available to us and for unittests
there is very little reason to incur the overhead of doing it
differently.
>
> I believe some of the marconi folks would also be enthused about
> enabling mongodb-based testing in the short term.
>
>> I think the real issue is to come up with a fairer algorithm that
>> prevents any node class from starving, even in the extreme case. And get
>> that implemented and accepted in nodepool.
>
> That would indeed be great. Do you think the simple randomization
> idea mooted earlier on this thread would suffice there, or am I over-
> simplifying?
>
>> I do think devstack was the right starting point, because it fixes lots
>> of issues we've had with us accidentally breaking fedora in devstack.
>> We've yet to figure out how overall reliable fedora is going to be.
>
> If there's anything more that the Red Hatters in the community can
> do to expedite the process of establishing the reliability of f20,
> please do let us know.
>
> Thanks!
> Eoghan
>
>>       -Sean
>>
>> --
>> Sean Dague
>> http://dague.net
>>
>>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Clark



More information about the OpenStack-Infra mailing list