[openstack-dev] [tripleo] becoming third party CI

Ben Nemec openstack at nemebean.com
Mon Mar 21 21:12:09 UTC 2016


On 03/21/2016 03:52 PM, Paul Belanger wrote:
> On Mon, Mar 21, 2016 at 10:57:53AM -0500, Ben Nemec wrote:
>> On 03/21/2016 07:33 AM, Derek Higgins wrote:
>>> On 17 March 2016 at 16:59, Ben Nemec <openstack at nemebean.com> wrote:
>>>> On 03/10/2016 05:24 PM, Jeremy Stanley wrote:
>>>>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
>>>>>> This seems to be the week people want to pile it on TripleO. Talking
>>>>>> about upstream is great but I suppose I'd rather debate major changes
>>>>>> after we branch Mitaka. :/
>>>>> [...]
>>>>>
>>>>> I didn't mean to pile on TripleO, nor did I intend to imply this was
>>>>> something which should happen ASAP (or even necessarily at all), but
>>>>> I do want to better understand what actual benefit is currently
>>>>> derived from this implementation vs. a more typical third-party CI
>>>>> (which lots of projects are doing when they find their testing needs
>>>>> are not met by the constraints of our generic test infrastructure).
>>>>>
>>>>>> With regards to Jenkins restarts I think it is understood that our job
>>>>>> times are long. How often do you find infra needs to restart Jenkins?
>>>>>
>>>>> We're restarting all 8 of our production Jenkins masters weekly at a
>>>>> minimum, but generally more often when things are busy (2-3 times a
>>>>> week). For many months we've been struggling with a thread leak for
>>>>> which their development team has not seen as a priority to even
>>>>> triage our bug report effectively. At this point I think we've
>>>>> mostly given up on expecting it to be solved by anything other than
>>>>> our upcoming migration off of Jenkins, but that's another topic
>>>>> altogether.
>>>>>
>>>>>> And regardless of that what if we just said we didn't mind the
>>>>>> destructiveness of losing a few jobs now and then (until our job
>>>>>> times are under the line... say 1.5 hours or so). To be clear I'd
>>>>>> be fine with infra pulling the rug on running jobs if this is the
>>>>>> root cause of the long running jobs in TripleO.
>>>>>
>>>>> For manual Jenkins restarts this is probably doable (if additional
>>>>> hassle), but I don't know whether that's something we can easily
>>>>> shoehorn into our orchestrated/automated restarts.
>>>>>
>>>>>> I think the "benefits are minimal" is bit of an overstatement. The
>>>>>> initial vision for TripleO CI stands and I would still like to see
>>>>>> individual projects entertain the option to use us in their gates.
>>>>> [...]
>>>>>
>>>>> This is what I'd like to delve deeper into. The current
>>>>> implementation isn't providing you with any mechanism to prevent
>>>>> changes which fail jobs running in the tripleo-test cloud from
>>>>> merging to your repos, is it? You're still having to manually
>>>>> inspect the job results posted by it? How is that particularly
>>>>> different from relying on third-party CI integration?
>>>>>
>>>>> As for other projects making use of the same jobs, right now the
>>>>> only convenience I'm aware of is that they can add check-tripleo
>>>>> pipeline jobs in our Zuul layout file instead of having you add it
>>>>> to yours (which could itself reside in a Git repo under your
>>>>> control, giving you even more flexibility over those choices). In
>>>>> fact, with a third-party CI using its own separate Gerrit account,
>>>>> you would be able to leave clear -1/+1 votes on check results which
>>>>> is not possible with the present solution.
>>>>>
>>>>> So anyway, I'm not saying that I definitely believe the third-party
>>>>> CI route will be better for TripleO, but I'm not (yet) clear on what
>>>>> tangible benefit you're receiving now that you lose by switching to
>>>>> that model.
>>>>>
>>>>
>>>> FWIW, I think third-party CI probably makes sense for TripleO.
>>>> Practically speaking we are third-party CI right now - we run our own
>>>> independent hardware infrastructure, we aren't multi-region, and we
>>>> can't leave a vote on changes.  Since the first two aren't likely to
>>>> change any time soon (although I believe it's still a long-term goal to
>>>> get to a place where we can run in regular infra and just contribute our
>>>> existing CI hardware to the general infra pool, but that's still a long
>>>> way off), and moving to actual third-party CI would get us the ability
>>>> to vote, I think it's worth pursuing.
>>>>
>>>> As an added bit of fun, we have a forced move of our CI hardware coming
>>>> up in the relatively near future, and if we don't want to have multiple
>>>> days (and possibly more, depending on how the move goes) of TripleO CI
>>>> outage we're probably going to need to stand up a new environment in
>>>> parallel anyway.  If we're doing that it might make sense to try hooking
>>>> it in through the third-party infra instead of the way we do it today.
>>>> Hopefully that would allow us to work out the kinks before the old
>>>> environment goes away.
>>>>
>>>> Anyway, I'm sure we'll need a bunch more discussion about this, but I
>>>> wanted to chime in with my two cents.
>>>
>>> We need to answer this question soon, I'm currently working on the CI
>>> parts that we need in order of move to OVB[1] and was assuming we
>>> would be maintaining the status quo. What we end up doing would look
>>> very different if we move to 3rd party CI, if using 3rd party CI we
>>> can simply start a vanilla centos instance at use it as an undercloud.
>>> It can then create its own baremetal testenv. If we remain inside the
>>> infra umbrella we'll have a seperate jenkins slave for each job that
>>> has to talk to a cetral broker to get a test env (including a seperate
>>> undercloud).
>>>
>>> Doing this in 3rd party ci I think simplyfies things because we'll no
>>> longer need a public cloud and as a result the security measures
>>> requeired to avoid putting cloud credentials on the jenkins slaves
>>> wont be needed.
>>
>> This sounds like a win, although we still need something to avoid having
>> our cloud credentials on test-accessible machines.  We don't want
>> someone to be able to push up a malicious patch and start booting
>> instances on our cloud.
>>
>> One less vm per job still seems like a good thing though.  The
>> undercloud could talk to the environment broker directly instead of
>> going through a Jenkins slave.  Not needing to maintain a publicly
>> accessible cloud seems good too (although we still need a public
>> location to drop our CI logs and such).
>>
>>>
>>> The down side to this is that we would (as I see it) be taking a step
>>> further away from ever being in the gate, tripleo ci hasn't for a long
>>> time gotten closer to getting into the gate, a large part of the
>>> reason I believe is because deploying trunk with tripleo isn't working
>>> a lot, the vast majority of these are because we're not in the gate,
>>> so we have a chicken and egg problem. We also have other reasons we're
>>> not in the gate, capacity for one, so maybe switching to 3rd party is
>>> ok at least until we address the capacity and can reassess.
>>>
>>> [1] - https://review.openstack.org/#/c/295243/
>>
>> Note that we can be voting as third-party CI:
>> http://docs.openstack.org/infra/system-config/third_party.html#permissions-on-your-third-party-system
>>
>> So I'm not sure this is actually a step away from gating all the
>> projects.  In fact, since we can't vote today as part of the integrated
>> gate, and I believe that would continue to be the case until we could
>> run entirely in regular infra instead of as a separate thing, I feel
>> like this is probably a requirement to be voting on other projects
>> anytime in the near future.
>>
>> The capacity issue is obviously a blocker, but even if we had the
>> hardware today, could we be gating on the other projects with our
>> current setup?  It's not clear to me that we could.  We can't even vote
>> on our own projects right now.
>>
> I would look it this way, you'd be gaining more generic capacity upstream (eg:
> testing overcloud puppet manifests) and maybe focus on your undercloud testing
> in 3rd party CI.

We actually do that already.  Our check queue (where stuff like
puppet-lint and unit tests run) is already regular infra instances.
Only the check-tripleo queue runs on our specific hardware.

But, that does emphasize the point that we're already behaving as
third-party CI, just one that is tightly coupled to regular infra in a
way that doesn't seem to benefit either side.

> 
> Additionally, all the results would need to be public so you'd still have people
> viewing them in the gate.
> 
> Right, you'd control which projects would be able to run on your 3rd party CI,
> even vote on too.  I think you are correct, focus on the current hardware, add
> more down the line.
> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list