[openstack-dev] [kolla] Stability and reliability of gate jobs

David Moreau Simard dms at redhat.com
Wed Jun 15 03:26:59 UTC 2016


Hi Kolla o/

I'm writing to you because I'm concerned.

In case you didn't already know, the RDO community collaborates with
upstream deployment and installation projects to test it's packaging.

This relationship is beneficial in a lot of ways for both parties, in summary:
- RDO has improved test coverage (because it's otherwise hard to test
different ways of installing, configuring and deploying OpenStack by
ourselves)
- The RDO community works with upstream projects (deployment or core
projects) to fix issues that we find
- In return, the collaborating deployment project can feel more
confident that the RDO packages it consumes have already been tested
using it's platform and should work

To make a long story short, we do this with a project called WeIRDO
[1] which essentially runs gate jobs outside of the gate.

I tried to get Kolla in our testing pipeline during the Mitaka cycle.
I really did.
I contributed the necessary features I needed in Kolla in order to
make this work, like the configurable Yum repositories for example.

However, in the end, I had to put off the initiative because the gate
jobs were very flappy and unreliable.
We cannot afford to have a job that is *expected* to flap in our
testing pipeline, it leads to a lot of wasted time, effort and
resources.

I think there's been a lot of improvements since my last attempt but
to get a sample of data, I looked at ~30 recently merged reviews.
Of 260 total build/deploy jobs, 55 (or over 20%) failed -- and I
didn't account for rechecks, just the last known status of the check
jobs.
I put up the results of those jobs here [2].

In the case that interests me most, CentOS binary jobs, it's 5
failures out of 50 jobs, so 10%. Not as bad but still a concern for
me.

Other deployment projects like Puppet-OpenStack, OpenStack Ansible,
Packstack and TripleO have quite a bit of *voting* integration testing
jobs.
Why are Kolla's jobs non-voting and so unreliable ?

Thanks,

[1]: https://github.com/rdo-infra/weirdo
[2]: https://docs.google.com/spreadsheets/d/1NYyMIDaUnlOD2wWuioAEOhjeVmZe7Q8_zdFfuLjquG4/edit#gid=0

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]



More information about the OpenStack-dev mailing list