[openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job
Devananda van der Veen
devananda.vdv at gmail.com
Wed Dec 2 01:10:57 UTC 2015
On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <shardy at redhat.com> wrote:
> On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
> > On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbitter at redhat.com>
> wrote:
> >
> > On 30/11/15 12:51, Ruby Loo wrote:
> >
> > On 30 November 2015 at 10:19, Derek Higgins <derekh at redhat.com
> > <mailto:derekh at redhat.com>> wrote:
> >
> > Â Â Hi All,
> >
> > Â Â Â Â Â A few months tripleo switch from its devtest based
> CI to
> > one
> > Â Â that was based on instack. Before doing this we anticipated
> > Â Â disruption in the ci jobs and removed them from non tripleo
> > projects.
> >
> > Â Â Â Â Â We'd like to investigate adding it back to heat and
> > ironic as
> > Â Â these are the two projects where we find our ci provides the
> > most
> > Â Â value. But we can only do this if the results from the job
> are
> > Â Â treated as voting.
> >
> > What does this mean? That the tripleo job could vote and do a -1
> and
> > block ironic's gate?
> >
> > Â Â Â Â Â In the past most of the non tripleo projects tended
> to
> > ignore
> > Â Â the results from the tripleo job as it wasn't unusual for
> the
> > job to
> > Â Â broken for days at a time. The thing is, ignoring the
> results of
> > the
> > Â Â job is the reason (the majority of the time) it was broken
> in
> > the
> > Â Â first place.
> > Â Â Â Â Â To decrease the number of breakages we are now no
> longer
> > Â Â running master code for everything (for the non tripleo
> projects
> > we
> > Â Â bump the versions we use periodically if they are working).
> I
> > Â Â believe with this model the CI jobs we run have become a lot
> > more
> > Â Â reliable, there are still breakages but far less frequently.
> >
> > Â Â What I proposing is we add at least one of our tripleo jobs
> back
> > to
> > Â Â both heat and ironic (and other projects associated with
> them
> > e.g.
> > Â Â clients, ironicinspector etc..), tripleo will switch to
> running
> > Â Â latest master of those repositories and the cores approving
> on
> > those
> > Â Â projects should wait for a passing CI jobs before hitting
> > approve.
> > Â Â So how do people feel about doing this? can we give it a
> go? A
> > Â Â couple of people have already expressed an interest in doing
> > this
> > Â Â but I'd like to make sure were all in agreement before
> switching
> > it on.
> >
> > This seems to indicate that the tripleo jobs are non-voting, or at
> > least
> > won't block the gate -- so I'm fine with adding tripleo jobs to
> > ironic.
> > But if you want cores to wait/make sure they pass, then shouldn't
> they
> > be voting? (Guess I'm a bit confused.)
> >
> > +1
> >
> > I don't think it hurts to turn it on, but tbh I'm uncomfortable
> with the
> > mental overhead of a non-voting job that I have to manually treat
> as a
> > voting job. If it's stable enough to make it a voting job, I'd
> prefer we
> > just make it voting. And if it's not then I'd like to see it be made
> > stable enough to be a voting job and then make it voting.
> >
> > This is roughly where I sit as well -- if it's non-voting, experience
> > tells me that it will largely be ignored, and as such, isn't a good
> use of
> > resources.
>
> I'm sure you can appreciate it's something of a chicken/egg problem though
> - if everyone always ignores non-voting jobs, they never become voting.
>
> That effect is magnified with TripleO though, because it consumes so many
> OpenStack projects, any one of which has the capability to break our CI, so
> in an ideal world we'd have voting feedback on all-the-things, but that's
> not where we are right now due in large-part to the steady stream of
> regressions (from Heat, Ironic and other projects).
>
> > I haven't looked at tripleo or tripleoci in a while, so I wont assume
> that
> > my recollection of the CI jobs bears any resemblance to what exists
> today.
> > Could you explain what areas of ironic (or its subprojects) will be
> > covered by these tests? If they are already covered by existing
> tests,
> > then I don't see the benefit of adding another job; conversely, if
> this is
> > testing areas we don't cover today, then there's probably value in
> running
> > tripleoci in a voting fashion for now and then moving that coverage
> into
> > ironic's project testing.
>
> I like to think of TripleO as a trunk-chasing "power user", and as such
> gives very valuable "user" feedback, including breaking things in exciting
> ways you hadn't anticipated in your project integration tests.
>
> This has, in the case of Heat at least, made TripleO an extremely effective
> "kitchen sink" stress test, and has uncovered numerous issues we failed to
> find with out internal tests (obviously we do add coverage when we find
> them).
>
> In the case of Ironic, I think the usage is somewhat less demanding, but no
> less "real world" - here's a good example for you:
>
> https://bugs.launchpad.net/ironic/+bug/1507738
>
> In this case, Ironic landed a change to master, which broke all existing
> deployments using Centos/RHEL derived distributions, so master Ironic has
> been broken for folks using those distros for over 6 weeks.
>
> I know in that case, the problem was really old ipxe image in the distro,
> and yes there were several possible workarounds, but as a developer who
> cares about users, I personally would rather get gate feedback than angry
> users on IRC/email when I unwittingly break the world for them ;)
>
> (note, I'm not assigning any blame above, it's one of *many* examples of
> unexpected breakage due to insufficient gate feedback of real usage accross
> many projects).
>
Great example, Steve, and I agree that more and faster feedback from users
into patches is a good thing. I'm also sad that it was broken for that long
and no one raised the issue in our meeting until this week.
This particular bug highlights a gap in Ironic's test coverage which I
would be delighted if someone wants to close -- that we aren't testing
support for RH-based distros. Closing that gap doesn't require TripleoCI at
all; we should simply add a dsvm job for Ironic on Fedora, using a
Fedora-based ramdisk. That will help prevent similar regressions in the
future.
Anyway, I have big reservations about putting TripleoCI on a path to ever
gating Ironic patches. I started to bikeshed on that and then deleted it
... tldr; I believe it is important for this job to vote in a non-gating
way. As a reviewer, I'm unlikely to pay attention to it if it doesn't vote,
and there's a good reason for this:
Non-voting jobs are used for experimentation. A non-voting job is a job
that we want to vote, but which we don't trust enough yet. It has been
promoted from the experimental pipeline to the check pipeline so that it
gets a lot more runs and so that we can stabilize it enough to make it
voting.
I was going to suggest that tripleoci vote as a third party CI system (I
know, it's not actually a third-party CI system, but I'd like to vote like
one). And then I noticed that it used to do just that. [0] If I'm
interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
separate account, left an informative -1, but did not block the gate.
That's exactly what I would like in this case.
Cheers,
-Devananda
[0] https://review.openstack.org/#/c/184402/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151201/7eca3844/attachment.html>
More information about the OpenStack-dev
mailing list