[openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

Steven Hardy shardy at redhat.com
Wed Dec 2 12:53:53 UTC 2015

On Tue, Dec 01, 2015 at 05:10:57PM -0800, Devananda van der Veen wrote:
>    On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <shardy at redhat.com> wrote:
>      On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
>      >    On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbitter at redhat.com>
>      wrote:
>      >
>      >      On 30/11/15 12:51, Ruby Loo wrote:
>      >
>      >        On 30 November 2015 at 10:19, Derek Higgins
>      <derekh at redhat.com
>      >        <mailto:derekh at redhat.com>> wrote:
>      >
>      >        Ã*  Ã*  Hi All,
>      >
>      >        Ã*  Ã*  Ã*  Ã*  Ã* A few months tripleo switch from
>      its devtest based CI to
>      >        one
>      >        Ã*  Ã*  that was based on instack. Before doing this we
>      anticipated
>      >        Ã*  Ã*  disruption in the ci jobs and removed them from
>      non tripleo
>      >        projects.
>      >
>      >        Ã*  Ã*  Ã*  Ã*  Ã* We'd like to investigate adding it
>      back to heat and
>      >        ironic as
>      >        Ã*  Ã*  these are the two projects where we find our ci
>      provides the
>      >        most
>      >        Ã*  Ã*  value. But we can only do this if the results
>      from the job are
>      >        Ã*  Ã*  treated as voting.
>      >
>      >        What does this mean? That the tripleo job could vote and do
>      a -1 and
>      >        block ironic's gate?
>      >
>      >        Ã*  Ã*  Ã*  Ã*  Ã* In the past most of the non tripleo
>      projects tended to
>      >        ignore
>      >        Ã*  Ã*  the results from the tripleo job as it wasn't
>      unusual for the
>      >        job to
>      >        Ã*  Ã*  broken for days at a time. The thing is, ignoring
>      the results of
>      >        the
>      >        Ã*  Ã*  job is the reason (the majority of the time) it
>      was broken in
>      >        the
>      >        Ã*  Ã*  first place.
>      >        Ã*  Ã*  Ã*  Ã*  Ã* To decrease the number of breakages
>      we are now no longer
>      >        Ã*  Ã*  running master code for everything (for the non
>      tripleo projects
>      >        we
>      >        Ã*  Ã*  bump the versions we use periodically if they are
>      working). I
>      >        Ã*  Ã*  believe with this model the CI jobs we run have
>      become a lot
>      >        more
>      >        Ã*  Ã*  reliable, there are still breakages but far less
>      frequently.
>      >
>      >        Ã*  Ã*  What I proposing is we add at least one of our
>      tripleo jobs back
>      >        to
>      >        Ã*  Ã*  both heat and ironic (and other projects
>      associated with them
>      >        e.g.
>      >        Ã*  Ã*  clients, ironicinspector etc..), tripleo will
>      switch to running
>      >        Ã*  Ã*  latest master of those repositories and the cores
>      approving on
>      >        those
>      >        Ã*  Ã*  projects should wait for a passing CI jobs before
>      hitting
>      >        approve.
>      >        Ã*  Ã*  So how do people feel about doing this? can we
>      give it a go? A
>      >        Ã*  Ã*  couple of people have already expressed an
>      interest in doing
>      >        this
>      >        Ã*  Ã*  but I'd like to make sure were all in agreement
>      before switching
>      >        it on.
>      >
>      >        This seems to indicate that the tripleo jobs are
>      non-voting, or at
>      >        least
>      >        won't block the gate -- so I'm fine with adding tripleo
>      jobs to
>      >        ironic.
>      >        But if you want cores to wait/make sure they pass, then
>      shouldn't they
>      >        be voting? (Guess I'm a bit confused.)
>      >
>      >      +1
>      >
>      >      I don't think it hurts to turn it on, but tbh I'm
>      uncomfortable with the
>      >      mental overhead of a non-voting job that I have to manually
>      treat as a
>      >      voting job. If it's stable enough to make it a voting job, I'd
>      prefer we
>      >      just make it voting. And if it's not then I'd like to see it
>      be made
>      >      stable enough to be a voting job and then make it voting.
>      >
>      >    This is roughly where I sit as well -- if it's non-voting,
>      experience
>      >    tells me that it will largely be ignored, and as such, isn't a
>      good use of
>      >    resources.
>      I'm sure you can appreciate it's something of a chicken/egg problem
>      though
>      - if everyone always ignores non-voting jobs, they never become voting.
>      That effect is magnified with TripleO though, because it consumes so
>      many
>      OpenStack projects, any one of which has the capability to break our CI,
>      so
>      in an ideal world we'd have voting feedback on all-the-things, but
>      that's
>      not where we are right now due in large-part to the steady stream of
>      regressions (from Heat, Ironic and other projects).
>      >    I haven't looked at tripleo or tripleoci in a while, so I wont
>      assume that
>      >    my recollection of the CI jobs bears any resemblance to what
>      exists today.
>      >    Could you explain what areas of ironic (or its subprojects) will
>      be
>      >    covered by these tests?Ã*  If they are already covered by
>      existing tests,
>      >    then I don't see the benefit of adding another job; conversely,
>      if this is
>      >    testing areas we don't cover today, then there's probably value
>      in running
>      >    tripleoci in a voting fashion for now and then moving that
>      coverage into
>      >    ironic's project testing.
>      I like to think of TripleO as a trunk-chasing "power user", and as such
>      gives very valuable "user" feedback, including breaking things in
>      exciting
>      ways you hadn't anticipated in your project integration tests.
>      This has, in the case of Heat at least, made TripleO an extremely
>      effective
>      "kitchen sink" stress test, and has uncovered numerous issues we failed
>      to
>      find with out internal tests (obviously we do add coverage when we find
>      them).
>      In the case of Ironic, I think the usage is somewhat less demanding, but
>      no
>      less "real world" - here's a good example for you:
>      https://bugs.launchpad.net/ironic/+bug/1507738
>      In this case, Ironic landed a change to master, which broke all existing
>      deployments using Centos/RHEL derived distributions, so master Ironic
>      has
>      been broken for folks using those distros for over 6 weeks.
>      I know in that case, the problem was really old ipxe image in the
>      distro,
>      and yes there were several possible workarounds, but as a developer who
>      cares about users, I personally would rather get gate feedback than
>      angry
>      users on IRC/email when I unwittingly break the world for them ;)
>      (note, I'm not assigning any blame above, it's one of *many* examples of
>      unexpected breakage due to insufficient gate feedback of real usage
>      accross
>      many projects).
>    Great example, Steve, and I agree that more and faster feedback from users
>    into patches is a good thing. I'm also sad that it was broken for that
>    long and no one raised the issue in our meeting until this week.
>    This particular bug highlights a gap in Ironic's test coverage which I
>    would be delighted if someone wants to close -- that we aren't testing
>    support for RH-based distros. Closing that gap doesn't require TripleoCI
>    at all; we should simply add a dsvm job for Ironic on Fedora, using a
>    Fedora-based ramdisk. That will help prevent similar regressions in the
>    future.
>    Anyway, I have big reservations about putting TripleoCI on a path to ever
>    gating Ironic patches. I started to bikeshed on that and then deleted it
>    ... tldr; I believe it is important for this job to vote in a non-gating
>    way. As a reviewer, I'm unlikely to pay attention to it if it doesn't
>    vote, and there's a good reason for this:
>    Non-voting jobs are used for experimentation. A non-voting job is a job
>    that we want to vote, but which we don't trust enough yet. It has been
>    promoted from the experimental pipeline to the check pipeline so that it
>    gets a lot more runs and so that we can stabilize it enough to make it
>    voting.

Ah, I think all we have here is a terminology mismatch around "non voting"
vs "non gating".

AFAIK what is being proposed is to reinstate the TripleO jobs so they *do*
vote on any change (+1/-1), but they do not block the gate, so we won't get
in the way if occasional outages happen.

>    I was going to suggest that tripleoci vote as a third party CI system (I
>    know, it's not actually a third-party CI system, but I'd like to vote like
>    one). And then I noticed that it used to do just that. [0] If I'm
>    interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
>    separate account, left an informative -1, but did not block the gate.
>    That's exactly what I would like in this case.

+1, I think that's what's being proposed, so we're in agreement! :)


More information about the OpenStack-dev mailing list