Open Stack

Tue Mar 22 16:36:12 UTC 2016

On Thu, 2016-03-10 at 23:24 +0000, Jeremy Stanley wrote:
> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
> > 
> > This seems to be the week people want to pile it on TripleO.
> > Talking
> > about upstream is great but I suppose I'd rather debate major
> > changes
> > after we branch Mitaka. :/
> [...]
> 
> I didn't mean to pile on TripleO, nor did I intend to imply this was
> something which should happen ASAP (or even necessarily at all), but
> I do want to better understand what actual benefit is currently
> derived from this implementation vs. a more typical third-party CI
> (which lots of projects are doing when they find their testing needs
> are not met by the constraints of our generic test infrastructure).
> 
> > 
> > With regards to Jenkins restarts I think it is understood that our
> > job
> > times are long. How often do you find infra needs to restart
> > Jenkins?
> We're restarting all 8 of our production Jenkins masters weekly at a
> minimum, but generally more often when things are busy (2-3 times a
> week). For many months we've been struggling with a thread leak for
> which their development team has not seen as a priority to even
> triage our bug report effectively. At this point I think we've
> mostly given up on expecting it to be solved by anything other than
> our upcoming migration off of Jenkins, but that's another topic
> altogether.
> 
> > 
> > And regardless of that what if we just said we didn't mind the
> > destructiveness of losing a few jobs now and then (until our job
> > times are under the line... say 1.5 hours or so). To be clear I'd
> > be fine with infra pulling the rug on running jobs if this is the
> > root cause of the long running jobs in TripleO.
> For manual Jenkins restarts this is probably doable (if additional
> hassle), but I don't know whether that's something we can easily
> shoehorn into our orchestrated/automated restarts.
> 
> > 
> > I think the "benefits are minimal" is bit of an overstatement. The
> > initial vision for TripleO CI stands and I would still like to see
> > individual projects entertain the option to use us in their gates.
> [...]
> 
> This is what I'd like to delve deeper into. The current
> implementation isn't providing you with any mechanism to prevent
> changes which fail jobs running in the tripleo-test cloud from
> merging to your repos, is it? You're still having to manually
> inspect the job results posted by it? How is that particularly
> different from relying on third-party CI integration?

Perhaps we don't have a lot of differences today but I don't think that
is where we want to be. Moving TripleO CI into 3rd party CI is IMO
strategically a bad move for the project that aims to provide a
feedback loop for breakages into other upstream OpenStack projects. I
would argue that we are in a unique position to do that in TripleO...
and becoming 3rd party CI is a retreat from providing this feedback
loop which can benefit other projects we rely on heavily (think Heat,
Mistral, Ironic, etc.). We want to gate our stuff. We need to gate our
own stuff.

That said we've overstepped our resource boundaries. Our job runtimes
are way long. We have several efforts in progress to help improve that.

1) Caching. Dereks' work on caching should significantly help us
improve our job wall times:

https://review.openstack.org/#/q/topic:mirror-server

2) metrics tracking. I've posted a patch to help us better track
various wall times and image size's in tripleo-ci:

https://review.openstack.org/#/c/291393/

3) the ability to test components of TripleO outside of baremetal
environments. Steve Hardy has been working on some approaches to
testing tripleo-heat-templates on normal OpenStack cloud instances.
Using this approach would allow us to test a significant portion of our
patches on groups of nodepool instances. Need to prototype this a bit
further but I think this holds some promising for allowing us to split
up our testing scenarios, etc.

So rather than ask why can't TripleO become 3rd party CI I'd ask what
harm are we causing where we are at? I like where we are at because the
management is well know to the team and other OpenStack projects.

And does working on the items above (speeding up our wall time, keeping
better metrics tracking, using more public cloud resource) help make
everyone happier?

Dan

> 
> As for other projects making use of the same jobs, right now the
> only convenience I'm aware of is that they can add check-tripleo
> pipeline jobs in our Zuul layout file instead of having you add it
> to yours (which could itself reside in a Git repo under your
> control, giving you even more flexibility over those choices). In
> fact, with a third-party CI using its own separate Gerrit account,
> you would be able to leave clear -1/+1 votes on check results which
> is not possible with the present solution.
> 
> So anyway, I'm not saying that I definitely believe the third-party
> CI route will be better for TripleO, but I'm not (yet) clear on what
> tangible benefit you're receiving now that you lose by switching to
> that model.

Open Stack

[openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)

OpenStack

Community

Documentation

Branding & Legal