[openstack-dev] [tripleo] becoming third party CI

Ben Nemec openstack at nemebean.com
Mon Mar 21 15:57:53 UTC 2016


On 03/21/2016 07:33 AM, Derek Higgins wrote:
> On 17 March 2016 at 16:59, Ben Nemec <openstack at nemebean.com> wrote:
>> On 03/10/2016 05:24 PM, Jeremy Stanley wrote:
>>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
>>>> This seems to be the week people want to pile it on TripleO. Talking
>>>> about upstream is great but I suppose I'd rather debate major changes
>>>> after we branch Mitaka. :/
>>> [...]
>>>
>>> I didn't mean to pile on TripleO, nor did I intend to imply this was
>>> something which should happen ASAP (or even necessarily at all), but
>>> I do want to better understand what actual benefit is currently
>>> derived from this implementation vs. a more typical third-party CI
>>> (which lots of projects are doing when they find their testing needs
>>> are not met by the constraints of our generic test infrastructure).
>>>
>>>> With regards to Jenkins restarts I think it is understood that our job
>>>> times are long. How often do you find infra needs to restart Jenkins?
>>>
>>> We're restarting all 8 of our production Jenkins masters weekly at a
>>> minimum, but generally more often when things are busy (2-3 times a
>>> week). For many months we've been struggling with a thread leak for
>>> which their development team has not seen as a priority to even
>>> triage our bug report effectively. At this point I think we've
>>> mostly given up on expecting it to be solved by anything other than
>>> our upcoming migration off of Jenkins, but that's another topic
>>> altogether.
>>>
>>>> And regardless of that what if we just said we didn't mind the
>>>> destructiveness of losing a few jobs now and then (until our job
>>>> times are under the line... say 1.5 hours or so). To be clear I'd
>>>> be fine with infra pulling the rug on running jobs if this is the
>>>> root cause of the long running jobs in TripleO.
>>>
>>> For manual Jenkins restarts this is probably doable (if additional
>>> hassle), but I don't know whether that's something we can easily
>>> shoehorn into our orchestrated/automated restarts.
>>>
>>>> I think the "benefits are minimal" is bit of an overstatement. The
>>>> initial vision for TripleO CI stands and I would still like to see
>>>> individual projects entertain the option to use us in their gates.
>>> [...]
>>>
>>> This is what I'd like to delve deeper into. The current
>>> implementation isn't providing you with any mechanism to prevent
>>> changes which fail jobs running in the tripleo-test cloud from
>>> merging to your repos, is it? You're still having to manually
>>> inspect the job results posted by it? How is that particularly
>>> different from relying on third-party CI integration?
>>>
>>> As for other projects making use of the same jobs, right now the
>>> only convenience I'm aware of is that they can add check-tripleo
>>> pipeline jobs in our Zuul layout file instead of having you add it
>>> to yours (which could itself reside in a Git repo under your
>>> control, giving you even more flexibility over those choices). In
>>> fact, with a third-party CI using its own separate Gerrit account,
>>> you would be able to leave clear -1/+1 votes on check results which
>>> is not possible with the present solution.
>>>
>>> So anyway, I'm not saying that I definitely believe the third-party
>>> CI route will be better for TripleO, but I'm not (yet) clear on what
>>> tangible benefit you're receiving now that you lose by switching to
>>> that model.
>>>
>>
>> FWIW, I think third-party CI probably makes sense for TripleO.
>> Practically speaking we are third-party CI right now - we run our own
>> independent hardware infrastructure, we aren't multi-region, and we
>> can't leave a vote on changes.  Since the first two aren't likely to
>> change any time soon (although I believe it's still a long-term goal to
>> get to a place where we can run in regular infra and just contribute our
>> existing CI hardware to the general infra pool, but that's still a long
>> way off), and moving to actual third-party CI would get us the ability
>> to vote, I think it's worth pursuing.
>>
>> As an added bit of fun, we have a forced move of our CI hardware coming
>> up in the relatively near future, and if we don't want to have multiple
>> days (and possibly more, depending on how the move goes) of TripleO CI
>> outage we're probably going to need to stand up a new environment in
>> parallel anyway.  If we're doing that it might make sense to try hooking
>> it in through the third-party infra instead of the way we do it today.
>> Hopefully that would allow us to work out the kinks before the old
>> environment goes away.
>>
>> Anyway, I'm sure we'll need a bunch more discussion about this, but I
>> wanted to chime in with my two cents.
> 
> We need to answer this question soon, I'm currently working on the CI
> parts that we need in order of move to OVB[1] and was assuming we
> would be maintaining the status quo. What we end up doing would look
> very different if we move to 3rd party CI, if using 3rd party CI we
> can simply start a vanilla centos instance at use it as an undercloud.
> It can then create its own baremetal testenv. If we remain inside the
> infra umbrella we'll have a seperate jenkins slave for each job that
> has to talk to a cetral broker to get a test env (including a seperate
> undercloud).
> 
> Doing this in 3rd party ci I think simplyfies things because we'll no
> longer need a public cloud and as a result the security measures
> requeired to avoid putting cloud credentials on the jenkins slaves
> wont be needed.

This sounds like a win, although we still need something to avoid having
our cloud credentials on test-accessible machines.  We don't want
someone to be able to push up a malicious patch and start booting
instances on our cloud.

One less vm per job still seems like a good thing though.  The
undercloud could talk to the environment broker directly instead of
going through a Jenkins slave.  Not needing to maintain a publicly
accessible cloud seems good too (although we still need a public
location to drop our CI logs and such).

> 
> The down side to this is that we would (as I see it) be taking a step
> further away from ever being in the gate, tripleo ci hasn't for a long
> time gotten closer to getting into the gate, a large part of the
> reason I believe is because deploying trunk with tripleo isn't working
> a lot, the vast majority of these are because we're not in the gate,
> so we have a chicken and egg problem. We also have other reasons we're
> not in the gate, capacity for one, so maybe switching to 3rd party is
> ok at least until we address the capacity and can reassess.
> 
> [1] - https://review.openstack.org/#/c/295243/

Note that we can be voting as third-party CI:
http://docs.openstack.org/infra/system-config/third_party.html#permissions-on-your-third-party-system

So I'm not sure this is actually a step away from gating all the
projects.  In fact, since we can't vote today as part of the integrated
gate, and I believe that would continue to be the case until we could
run entirely in regular infra instead of as a separate thing, I feel
like this is probably a requirement to be voting on other projects
anytime in the near future.

The capacity issue is obviously a blocker, but even if we had the
hardware today, could we be gating on the other projects with our
current setup?  It's not clear to me that we could.  We can't even vote
on our own projects right now.



More information about the OpenStack-dev mailing list