[openstack-dev] [TripleO][CI] Bridging the production/CI workflow gap with large periodic CI jobs

Ben Nemec openstack at nemebean.com
Tue Apr 18 15:17:54 UTC 2017



On 04/17/2017 02:52 PM, Justin Kilpatrick wrote:
> Because CI jobs tend to max out about 5 nodes there's a whole class of
> minor bugs that make it into releases.
>
> What happens is that they never show up in small clouds, then when
> they do show up in larger testing clouds the people deploying those
> simply work around the issue and get onto what they where supposed to
> be testing. These workarounds do get documented/BZ'd but since they
> don't block anyone and only show up in large environments they become
> hard for developers to fix.
>
> So the issue gets stuck in limbo, with nowhere to test a patchset and
> no one owning the issue.
>
> These issues pile up and pretty soon there is a significant difference
> between the default documented workflow and the 'scale' workflow which
> is filled with workarounds which may or may not be documented
> upstream.
>
> I'd like to propose getting these issues more visibility to having a
> periodic upstream job that uses 20-30 ovb instances to do a larger
> deployment. Maybe at 3am on a Sunday or some other time where there's
> idle execution capability to exploit. The goal being to make these
> sorts of issues more visible and hopefully get better at fixing them.
>
> To be honest I'm not sure this is the best solution, but I'm seeing
> this anti pattern across several issues and I think we should try and
> come up with a solution.

I like this idea a lot, and I think we discussed it previously on IRC 
and worked through some potential issues with setting up such a job. 
One other thing that occurred to me since then is that deployments at 
scale generally require a larger undercloud than we have in CI. 
Unfortunately I'm not sure whether we can change that just for a 
periodic job.  There are a couple of potential workarounds for that, but 
they would add some complication so we'll need to keep that in mind.

Overall +1 to the idea though.  Larger scale deployments are clearly 
something we won't be able to run on every patch set so a periodic job 
seems like the right fit here.



More information about the OpenStack-dev mailing list