[openstack-dev] [TripleO][CI] Bridging the production/CI workflow gap with large periodic CI jobs

Justin Kilpatrick jkilpatr at redhat.com
Mon Apr 17 19:52:51 UTC 2017


Because CI jobs tend to max out about 5 nodes there's a whole class of
minor bugs that make it into releases.

What happens is that they never show up in small clouds, then when
they do show up in larger testing clouds the people deploying those
simply work around the issue and get onto what they where supposed to
be testing. These workarounds do get documented/BZ'd but since they
don't block anyone and only show up in large environments they become
hard for developers to fix.

So the issue gets stuck in limbo, with nowhere to test a patchset and
no one owning the issue.

These issues pile up and pretty soon there is a significant difference
between the default documented workflow and the 'scale' workflow which
is filled with workarounds which may or may not be documented
upstream.

I'd like to propose getting these issues more visibility to having a
periodic upstream job that uses 20-30 ovb instances to do a larger
deployment. Maybe at 3am on a Sunday or some other time where there's
idle execution capability to exploit. The goal being to make these
sorts of issues more visible and hopefully get better at fixing them.

To be honest I'm not sure this is the best solution, but I'm seeing
this anti pattern across several issues and I think we should try and
come up with a solution.



More information about the OpenStack-dev mailing list