Open Stack

Wed Dec 2 12:53:53 UTC 2015

On Tue, Dec 01, 2015 at 05:10:57PM -0800, Devananda van der Veen wrote:
>    On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <shardy at redhat.com> wrote:
> 
>      On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
>      >Â  Â  On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbitter at redhat.com>
>      wrote:
>      >
>      >Â  Â  Â  On 30/11/15 12:51, Ruby Loo wrote:
>      >
>      >Â  Â  Â  Â  On 30 November 2015 at 10:19, Derek Higgins
>      <derekh at redhat.com
>      >Â  Â  Â  Â  <mailto:derekh at redhat.com>> wrote:
>      >
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  Hi All,
>      >
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â A few months tripleo switch from
>      its devtest based CI to
>      >Â  Â  Â  Â  one
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  that was based on instack. Before doing this we
>      anticipated
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  disruption in the ci jobs and removed them from
>      non tripleo
>      >Â  Â  Â  Â  projects.
>      >
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â We'd like to investigate adding it
>      back to heat and
>      >Â  Â  Â  Â  ironic as
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  these are the two projects where we find our ci
>      provides the
>      >Â  Â  Â  Â  most
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  value. But we can only do this if the results
>      from the job are
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  treated as voting.
>      >
>      >Â  Â  Â  Â  What does this mean? That the tripleo job could vote and do
>      a -1 and
>      >Â  Â  Â  Â  block ironic's gate?
>      >
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â In the past most of the non tripleo
>      projects tended to
>      >Â  Â  Â  Â  ignore
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  the results from the tripleo job as it wasn't
>      unusual for the
>      >Â  Â  Â  Â  job to
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  broken for days at a time. The thing is, ignoring
>      the results of
>      >Â  Â  Â  Â  the
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  job is the reason (the majority of the time) it
>      was broken in
>      >Â  Â  Â  Â  the
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  first place.
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â To decrease the number of breakages
>      we are now no longer
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  running master code for everything (for the non
>      tripleo projects
>      >Â  Â  Â  Â  we
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  bump the versions we use periodically if they are
>      working). I
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  believe with this model the CI jobs we run have
>      become a lot
>      >Â  Â  Â  Â  more
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  reliable, there are still breakages but far less
>      frequently.
>      >
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  What I proposing is we add at least one of our
>      tripleo jobs back
>      >Â  Â  Â  Â  to
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  both heat and ironic (and other projects
>      associated with them
>      >Â  Â  Â  Â  e.g.
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  clients, ironicinspector etc..), tripleo will
>      switch to running
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  latest master of those repositories and the cores
>      approving on
>      >Â  Â  Â  Â  those
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  projects should wait for a passing CI jobs before
>      hitting
>      >Â  Â  Â  Â  approve.
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  So how do people feel about doing this? can we
>      give it a go? A
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  couple of people have already expressed an
>      interest in doing
>      >Â  Â  Â  Â  this
>      >Â  Â  Â  Â  Ã*Â  Ã*Â  but I'd like to make sure were all in agreement
>      before switching
>      >Â  Â  Â  Â  it on.
>      >
>      >Â  Â  Â  Â  This seems to indicate that the tripleo jobs are
>      non-voting, or at
>      >Â  Â  Â  Â  least
>      >Â  Â  Â  Â  won't block the gate -- so I'm fine with adding tripleo
>      jobs to
>      >Â  Â  Â  Â  ironic.
>      >Â  Â  Â  Â  But if you want cores to wait/make sure they pass, then
>      shouldn't they
>      >Â  Â  Â  Â  be voting? (Guess I'm a bit confused.)
>      >
>      >Â  Â  Â  +1
>      >
>      >Â  Â  Â  I don't think it hurts to turn it on, but tbh I'm
>      uncomfortable with the
>      >Â  Â  Â  mental overhead of a non-voting job that I have to manually
>      treat as a
>      >Â  Â  Â  voting job. If it's stable enough to make it a voting job, I'd
>      prefer we
>      >Â  Â  Â  just make it voting. And if it's not then I'd like to see it
>      be made
>      >Â  Â  Â  stable enough to be a voting job and then make it voting.
>      >
>      >Â  Â  This is roughly where I sit as well -- if it's non-voting,
>      experience
>      >Â  Â  tells me that it will largely be ignored, and as such, isn't a
>      good use of
>      >Â  Â  resources.
> 
>      I'm sure you can appreciate it's something of a chicken/egg problem
>      though
>      - if everyone always ignores non-voting jobs, they never become voting.
> 
>      That effect is magnified with TripleO though, because it consumes so
>      many
>      OpenStack projects, any one of which has the capability to break our CI,
>      so
>      in an ideal world we'd have voting feedback on all-the-things, but
>      that's
>      not where we are right now due in large-part to the steady stream of
>      regressions (from Heat, Ironic and other projects).
>      >Â  Â  I haven't looked at tripleo or tripleoci in a while, so I wont
>      assume that
>      >Â  Â  my recollection of the CI jobs bears any resemblance to what
>      exists today.
>      >Â  Â  Could you explain what areas of ironic (or its subprojects) will
>      be
>      >Â  Â  covered by these tests?Ã*Â  If they are already covered by
>      existing tests,
>      >Â  Â  then I don't see the benefit of adding another job; conversely,
>      if this is
>      >Â  Â  testing areas we don't cover today, then there's probably value
>      in running
>      >Â  Â  tripleoci in a voting fashion for now and then moving that
>      coverage into
>      >Â  Â  ironic's project testing.
> 
>      I like to think of TripleO as a trunk-chasing "power user", and as such
>      gives very valuable "user" feedback, including breaking things in
>      exciting
>      ways you hadn't anticipated in your project integration tests.
> 
>      This has, in the case of Heat at least, made TripleO an extremely
>      effective
>      "kitchen sink" stress test, and has uncovered numerous issues we failed
>      to
>      find with out internal tests (obviously we do add coverage when we find
>      them).
> 
>      In the case of Ironic, I think the usage is somewhat less demanding, but
>      no
>      less "real world" - here's a good example for you:
> 
>      https://bugs.launchpad.net/ironic/+bug/1507738
> 
>      In this case, Ironic landed a change to master, which broke all existing
>      deployments using Centos/RHEL derived distributions, so master Ironic
>      has
>      been broken for folks using those distros for over 6 weeks.
> 
>      I know in that case, the problem was really old ipxe image in the
>      distro,
>      and yes there were several possible workarounds, but as a developer who
>      cares about users, I personally would rather get gate feedback than
>      angry
>      users on IRC/email when I unwittingly break the world for them ;)
> 
>      (note, I'm not assigning any blame above, it's one of *many* examples of
>      unexpected breakage due to insufficient gate feedback of real usage
>      accross
>      many projects).
> 
>    Great example, Steve, and I agree that more and faster feedback from users
>    into patches is a good thing. I'm also sad that it was broken for that
>    long and no one raised the issue in our meeting until this week.
>    This particular bug highlights a gap in Ironic's test coverage which I
>    would be delighted if someone wants to close -- that we aren't testing
>    support for RH-based distros. Closing that gap doesn't require TripleoCI
>    at all; we should simply add a dsvm job for Ironic on Fedora, using a
>    Fedora-based ramdisk. That will help prevent similar regressions in the
>    future.
>    Anyway, I have big reservations about putting TripleoCI on a path to ever
>    gating Ironic patches. I started to bikeshed on that and then deleted it
>    ... tldr; I believe it is important for this job to vote in a non-gating
>    way. As a reviewer, I'm unlikely to pay attention to it if it doesn't
>    vote, and there's a good reason for this:
>    Non-voting jobs are used for experimentation. A non-voting job is a job
>    that we want to vote, but which we don't trust enough yet. It has been
>    promoted from the experimental pipeline to the check pipeline so that it
>    gets a lot more runs and so that we can stabilize it enough to make it
>    voting.

Ah, I think all we have here is a terminology mismatch around "non voting"
vs "non gating".

AFAIK what is being proposed is to reinstate the TripleO jobs so they *do*
vote on any change (+1/-1), but they do not block the gate, so we won't get
in the way if occasional outages happen.

>    I was going to suggest that tripleoci vote as a third party CI system (I
>    know, it's not actually a third-party CI system, but I'd like to vote like
>    one). And then I noticed that it used to do just that. [0] If I'm
>    interpreting it correctly, the "gate-tripleo-ironic*" jobs voted from a
>    separate account, left an informative -1, but did not block the gate.
>    That's exactly what I would like in this case.

+1, I think that's what's being proposed, so we're in agreement! :)

Steve

Open Stack

[openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

OpenStack

Community

Documentation

Branding & Legal