[openstack-dev] [tripleo] CI jobs failures

Dan Prince dprince at redhat.com
Mon Mar 7 13:03:55 UTC 2016


On Sat, 2016-03-05 at 11:15 -0500, Emilien Macchi wrote:
> I'm kind of hijacking Dan's e-mail but I would like to propose some
> technical improvements to stop having so much CI failures.
> 
> 
> 1/ Stop creating swap files. We don't have SSD, this is IMHO a
> terrible
> mistake to swap on files because we don't have enough RAM. In my
> experience, swaping on non-SSD disks is even worst that not having
> enough RAM. We should stop doing that I think.
> 
> 
> 2/ Split CI jobs in scenarios.
> 
> Currently we have CI jobs for ceph, HA, non-ha, containers and the
> current situation is that jobs fail randomly, due to performances
> issues.
> 
> Puppet OpenStack CI had the same issue where we had one integration
> job
> and we never stopped adding more services until all becomes *very*
> unstable. We solved that issue by splitting the jobs and creating
> scenarios:
> 
> https://github.com/openstack/puppet-openstack-integration#description
> 
> What I propose is to split TripleO jobs in more jobs, but with less
> services.
> 
> The benefit of that:
> 
> * more services coverage
> * jobs will run faster
> * less random issues due to bad performances
> 
> The cost is of course it will consume more resources.
> That's why I suggest 3/.
> 
> We could have:
> 
> * HA job with ceph and a full compute scenario (glance, nova, cinder,
> ceilometer, aodh & gnocchi).
> * Same with IPv6 & SSL.
> * HA job without ceph and full compute scenario too
> * HA job without ceph and basic compute (glance and nova), with extra
> services like Trove, Sahara, etc.
> * ...
> (note: all jobs would have network isolation, which is to me a
> requirement when testing an installer like TripleO).

I'm not sure we have enough resources to entertain this option. I would
like to see us split the jobs up but not in exactly the way you
describe above. I would rather see us put the effort into architecture
changes like "split stack" which cloud allow us to test the
configuration side of our Heat stack on normal Cloud instances. Once we
have this in place I think we would have more potential resources and
could entertain running more jobs to and thus could split things out to
run in parallel if we choose to do so.

> 
> 3/ Drop non-ha job.
> I'm not sure why we have it, and the benefit of testing that
> comparing
> to HA.

A couple of reasons we have the nonha job I think. First is that not
everyone wants to use HA. We run our own TripleO CI cloud without HA at
this point and I think there is interest in maintaining this as a less
complex installation alternative where HA isn't needed.

Second is need to support functionally testing TripleO where developers
don't have enough resources for 3 controller nodes. At the very least
we'd need a second single node HA job (which wouldn't really be doing
HA) but would allow us to continue supporting the compressed
installation for developer testing, etc.

Dan

> 
> 
> Any comment / feedback is welcome,
> _____________________________________________________________________
> _____
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
> cribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list