Open Stack

Thu Mar 17 18:55:24 UTC 2016

On 03/17/2016 01:13 PM, Paul Belanger wrote:
> On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote:
>> On 03/10/2016 05:24 PM, Jeremy Stanley wrote:
>>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
>>>> This seems to be the week people want to pile it on TripleO. Talking
>>>> about upstream is great but I suppose I'd rather debate major changes
>>>> after we branch Mitaka. :/
>>> [...]
>>>
>>> I didn't mean to pile on TripleO, nor did I intend to imply this was
>>> something which should happen ASAP (or even necessarily at all), but
>>> I do want to better understand what actual benefit is currently
>>> derived from this implementation vs. a more typical third-party CI
>>> (which lots of projects are doing when they find their testing needs
>>> are not met by the constraints of our generic test infrastructure).
>>>
>>>> With regards to Jenkins restarts I think it is understood that our job
>>>> times are long. How often do you find infra needs to restart Jenkins?
>>>
>>> We're restarting all 8 of our production Jenkins masters weekly at a
>>> minimum, but generally more often when things are busy (2-3 times a
>>> week). For many months we've been struggling with a thread leak for
>>> which their development team has not seen as a priority to even
>>> triage our bug report effectively. At this point I think we've
>>> mostly given up on expecting it to be solved by anything other than
>>> our upcoming migration off of Jenkins, but that's another topic
>>> altogether.
>>>
>>>> And regardless of that what if we just said we didn't mind the
>>>> destructiveness of losing a few jobs now and then (until our job
>>>> times are under the line... say 1.5 hours or so). To be clear I'd
>>>> be fine with infra pulling the rug on running jobs if this is the
>>>> root cause of the long running jobs in TripleO.
>>>
>>> For manual Jenkins restarts this is probably doable (if additional
>>> hassle), but I don't know whether that's something we can easily
>>> shoehorn into our orchestrated/automated restarts.
>>>
>>>> I think the "benefits are minimal" is bit of an overstatement. The
>>>> initial vision for TripleO CI stands and I would still like to see
>>>> individual projects entertain the option to use us in their gates.
>>> [...]
>>>
>>> This is what I'd like to delve deeper into. The current
>>> implementation isn't providing you with any mechanism to prevent
>>> changes which fail jobs running in the tripleo-test cloud from
>>> merging to your repos, is it? You're still having to manually
>>> inspect the job results posted by it? How is that particularly
>>> different from relying on third-party CI integration?
>>>
>>> As for other projects making use of the same jobs, right now the
>>> only convenience I'm aware of is that they can add check-tripleo
>>> pipeline jobs in our Zuul layout file instead of having you add it
>>> to yours (which could itself reside in a Git repo under your
>>> control, giving you even more flexibility over those choices). In
>>> fact, with a third-party CI using its own separate Gerrit account,
>>> you would be able to leave clear -1/+1 votes on check results which
>>> is not possible with the present solution.
>>>
>>> So anyway, I'm not saying that I definitely believe the third-party
>>> CI route will be better for TripleO, but I'm not (yet) clear on what
>>> tangible benefit you're receiving now that you lose by switching to
>>> that model.
>>>
>>
>> FWIW, I think third-party CI probably makes sense for TripleO.
>> Practically speaking we are third-party CI right now - we run our own
>> independent hardware infrastructure, we aren't multi-region, and we
>> can't leave a vote on changes.  Since the first two aren't likely to
>> change any time soon (although I believe it's still a long-term goal to
>> get to a place where we can run in regular infra and just contribute our
>> existing CI hardware to the general infra pool, but that's still a long
>> way off), and moving to actual third-party CI would get us the ability
>> to vote, I think it's worth pursuing.
>>
>> As an added bit of fun, we have a forced move of our CI hardware coming
>> up in the relatively near future, and if we don't want to have multiple
>> days (and possibly more, depending on how the move goes) of TripleO CI
>> outage we're probably going to need to stand up a new environment in
>> parallel anyway.  If we're doing that it might make sense to try hooking
>> it in through the third-party infra instead of the way we do it today.
>> Hopefully that would allow us to work out the kinks before the old
>> environment goes away.
>>
>> Anyway, I'm sure we'll need a bunch more discussion about this, but I
>> wanted to chime in with my two cents.
>>
> Do you have any ETA on when your outage would be?  Is it before or after the
> summit in Austin?
> 
> Personally, I'm going to attend a few TripleO design session where ever
> possible in Austin. It would be great to maybe have a fishbowl session about it.

It's after, but we'll only have a couple of months or so at that point
to wrap everything up, so I suspect we'll need to have some basic plan
in place before or we'll never be able to get hardware in time.  It may
be too late already. :-/

Probably the first thing I need to do is follow up with people
internally and find out if there's already a plan in place for this that
I just don't know about.  That's entirely possible.

Open Stack

[openstack-dev] [tripleo] becoming third party CI

OpenStack

Community

Documentation

Branding & Legal