[openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Derek Higgins derekh at redhat.com
Wed Dec 2 15:58:01 UTC 2015



On 02/12/15 12:53, Steven Hardy wrote:
> On Tue, Dec 01, 2015 at 05:10:57PM -0800, Devananda van der Veen wrote:
>>     On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <shardy at redhat.com> wrote:
>>
>>       On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
>>       >Â  Â  On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbitter at redhat.com>
>>       wrote:
>>       >
>>       >Â  Â  Â  On 30/11/15 12:51, Ruby Loo wrote:
>>       >
>>       >Â  Â  Â  Â  On 30 November 2015 at 10:19, Derek Higgins
>>       <derekh at redhat.com
>>       >Â  Â  Â  Â  <mailto:derekh at redhat.com>> wrote:
>>       >
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  Hi All,
>>       >
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â A few months tripleo switch from
>>       its devtest based CI to
>>       >Â  Â  Â  Â  one
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  that was based on instack. Before doing this we
>>       anticipated
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  disruption in the ci jobs and removed them from
>>       non tripleo
>>       >Â  Â  Â  Â  projects.
>>       >
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â We'd like to investigate adding it
>>       back to heat and
>>       >Â  Â  Â  Â  ironic as
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  these are the two projects where we find our ci
>>       provides the
>>       >Â  Â  Â  Â  most
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  value. But we can only do this if the results
>>       from the job are
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  treated as voting.
>>       >
>>       >Â  Â  Â  Â  What does this mean? That the tripleo job could vote and do
>>       a -1 and
>>       >Â  Â  Â  Â  block ironic's gate?
>>       >
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â In the past most of the non tripleo
>>       projects tended to
>>       >Â  Â  Â  Â  ignore
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  the results from the tripleo job as it wasn't
>>       unusual for the
>>       >Â  Â  Â  Â  job to
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  broken for days at a time. The thing is, ignoring
>>       the results of
>>       >Â  Â  Â  Â  the
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  job is the reason (the majority of the time) it
>>       was broken in
>>       >Â  Â  Â  Â  the
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  first place.
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â To decrease the number of breakages
>>       we are now no longer
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  running master code for everything (for the non
>>       tripleo projects
>>       >Â  Â  Â  Â  we
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  bump the versions we use periodically if they are
>>       working). I
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  believe with this model the CI jobs we run have
>>       become a lot
>>       >Â  Â  Â  Â  more
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  reliable, there are still breakages but far less
>>       frequently.
>>       >
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  What I proposing is we add at least one of our
>>       tripleo jobs back
>>       >Â  Â  Â  Â  to
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  both heat and ironic (and other projects
>>       associated with them
>>       >Â  Â  Â  Â  e.g.
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  clients, ironicinspector etc..), tripleo will
>>       switch to running
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  latest master of those repositories and the cores
>>       approving on
>>       >Â  Â  Â  Â  those
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  projects should wait for a passing CI jobs before
>>       hitting
>>       >Â  Â  Â  Â  approve.
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  So how do people feel about doing this? can we
>>       give it a go? A
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  couple of people have already expressed an
>>       interest in doing
>>       >Â  Â  Â  Â  this
>>       >Â  Â  Â  Â  Ã*Â  Ã*Â  but I'd like to make sure were all in agreement
>>       before switching
>>       >Â  Â  Â  Â  it on.
>>       >
>>       >Â  Â  Â  Â  This seems to indicate that the tripleo jobs are
>>       non-voting, or at
>>       >Â  Â  Â  Â  least
>>       >Â  Â  Â  Â  won't block the gate -- so I'm fine with adding tripleo
>>       jobs to
>>       >Â  Â  Â  Â  ironic.
>>       >Â  Â  Â  Â  But if you want cores to wait/make sure they pass, then
>>       shouldn't they
>>       >Â  Â  Â  Â  be voting? (Guess I'm a bit confused.)
>>       >
>>       >Â  Â  Â  +1
>>       >
>>       >Â  Â  Â  I don't think it hurts to turn it on, but tbh I'm
>>       uncomfortable with the
>>       >Â  Â  Â  mental overhead of a non-voting job that I have to manually
>>       treat as a
>>       >Â  Â  Â  voting job. If it's stable enough to make it a voting job, I'd
>>       prefer we
>>       >Â  Â  Â  just make it voting. And if it's not then I'd like to see it
>>       be made
>>       >Â  Â  Â  stable enough to be a voting job and then make it voting.
>>       >
>>       >Â  Â  This is roughly where I sit as well -- if it's non-voting,
>>       experience
>>       >Â  Â  tells me that it will largely be ignored, and as such, isn't a
>>       good use of
>>       >Â  Â  resources.
>>
>>       I'm sure you can appreciate it's something of a chicken/egg problem
>>       though
>>       - if everyone always ignores non-voting jobs, they never become voting.
>>
>>       That effect is magnified with TripleO though, because it consumes so
>>       many
>>       OpenStack projects, any one of which has the capability to break our CI,
>>       so
>>       in an ideal world we'd have voting feedback on all-the-things, but
>>       that's
>>       not where we are right now due in large-part to the steady stream of
>>       regressions (from Heat, Ironic and other projects).
>>       >Â  Â  I haven't looked at tripleo or tripleoci in a while, so I wont
>>       assume that
>>       >Â  Â  my recollection of the CI jobs bears any resemblance to what
>>       exists today.
>>       >Â  Â  Could you explain what areas of ironic (or its subprojects) will
>>       be
>>       >Â  Â  covered by these tests?Ã*Â  If they are already covered by
>>       existing tests,
>>       >Â  Â  then I don't see the benefit of adding another job; conversely,
>>       if this is
>>       >Â  Â  testing areas we don't cover today, then there's probably value
>>       in running
>>       >Â  Â  tripleoci in a voting fashion for now and then moving that
>>       coverage into
>>       >Â  Â  ironic's project testing.
>>
>>       I like to think of TripleO as a trunk-chasing "power user", and as such
>>       gives very valuable "user" feedback, including breaking things in
>>       exciting
>>       ways you hadn't anticipated in your project integration tests.
>>
>>       This has, in the case of Heat at least, made TripleO an extremely
>>       effective
>>       "kitchen sink" stress test, and has uncovered numerous issues we failed
>>       to
>>       find with out internal tests (obviously we do add coverage when we find
>>       them).
>>
>>       In the case of Ironic, I think the usage is somewhat less demanding, but
>>       no
>>       less "real world" - here's a good example for you:
>>
>>       https://bugs.launchpad.net/ironic/+bug/1507738
>>
>>       In this case, Ironic landed a change to master, which broke all existing
>>       deployments using Centos/RHEL derived distributions, so master Ironic
>>       has
>>       been broken for folks using those distros for over 6 weeks.
>>
>>       I know in that case, the problem was really old ipxe image in the
>>       distro,
>>       and yes there were several possible workarounds, but as a developer who
>>       cares about users, I personally would rather get gate feedback than
>>       angry
>>       users on IRC/email when I unwittingly break the world for them ;)
>>
>>       (note, I'm not assigning any blame above, it's one of *many* examples of
>>       unexpected breakage due to insufficient gate feedback of real usage
>>       accross
>>       many projects).
>>
>>     Great example, Steve, and I agree that more and faster feedback from users
>>     into patches is a good thing. I'm also sad that it was broken for that
>>     long and no one raised the issue in our meeting until this week.
>>     This particular bug highlights a gap in Ironic's test coverage which I
>>     would be delighted if someone wants to close -- that we aren't testing
>>     support for RH-based distros. Closing that gap doesn't require TripleoCI
>>     at all; we should simply add a dsvm job for Ironic on Fedora, using a
>>     Fedora-based ramdisk. That will help prevent similar regressions in the
>>     future.
>>     Anyway, I have big reservations about putting TripleoCI on a path to ever
>>     gating Ironic patches. I started to bikeshed on that and then deleted it
>>     ... tldr; I believe it is important for this job to vote in a non-gating
>>     way. As a reviewer, I'm unlikely to pay attention to it if it doesn't
>>     vote, and there's a good reason for this:
>>     Non-voting jobs are used for experimentation. A non-voting job is a job
>>     that we want to vote, but which we don't trust enough yet. It has been
>>     promoted from the experimental pipeline to the check pipeline so that it
>>     gets a lot more runs and so that we can stabilize it enough to make it
>>     voting.
>
> Ah, I think all we have here is a terminology mismatch around "non voting"
> vs "non gating".
>
> AFAIK what is being proposed is to reinstate the TripleO jobs so they *do*
> vote on any change (+1/-1), but they do not block the gate, so we won't get
> in the way if occasional outages happen.

Yes, this is exactly what I wanted to do, nothing would be changing from 
how it used to be, the tripleo jobs would vote with a -1/+1 but 
approvers could still approve if they wanted to (i.e. not in the gate). 
The only thing I am asking we do differently to the way it used to be is 
an agreement to not blindly ignore the results of the tripleo job as 
ignoring the results is what causes a lot of the breakages in the first 
place.

As for the gating side of the conversation, I don't think actual gating 
is feasible at least in the short term. This would put a higher demand 
on our resources (a demand I'm not sure we have the hardware to meet) 
and I don't think we have the redundancy necessary in our cloud.

>
>>     I was going to suggest that tripleoci vote as a third party CI system (I
>>     know, it's not actually a third-party CI system, but I'd like to vote like
>>     one). And then I noticed that it used to do just that. [0] If I'm
>>     interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
>>     separate account, left an informative -1, but did not block the gate.
>>     That's exactly what I would like in this case.
>
> +1, I think that's what's being proposed, so we're in agreement! :)
>
> Steve
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list