[openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream

Adam Spiers aspiers at suse.com
Tue Mar 6 12:27:00 UTC 2018

Hi Raoul and all,

Sorry for joining this discussion late!

Raoul Scarazzini <rasca at redhat.com> wrote:
>TL;DR: we would like to change the way HA is tested upstream to avoid
>being hitten by evitable bugs that the CI process should discover.
>Long version:
>Today HA testing in upstream consist only in verifying that a three
>controllers setup comes up correctly and can spawn an instance. That's
>something, but it’s far from being enough since we continuously see "day
>two" bugs.
>We started covering this more than a year ago in internal CI and today
>also on rdocloud using a project named tripleo-quickstart-utils [1].
>Apart from his name, the project is not limited to tripleo-quickstart,
>it covers three principal roles:
>1 - stonith-config: a playbook that can be used to automate the creation
>of fencing devices in the overcloud;
>2 - instance-ha: a playbook that automates the seventeen manual steps
>needed to configure instance HA in the overcloud, test them via rally
>and verify that instance HA works;
>3 - validate-ha: a playbook that runs a series of disruptive actions in
>the overcloud and verifies it always behaves correctly by deploying a
>heat-template that involves all the overcloud components;

Yes, a more rigorous approach to HA testing obviously has huge value,
not just for TripleO deployments, but also for any type of OpenStack

>To make this usable upstream, we need to understand where to put this
>code. Here some choices:


I do not work on TripleO, but I'm part of the wider OpenStack
sub-communities which focus on HA[0] and more recently,
self-healing[1].  With that hat on, I'd like to suggest that maybe
it's possible to collaborate on this in a manner which is agnostic to
the deployment mechanism.  There is an open spec on this:


which was mentioned in the Denver PTG session on destructive testing
which you referenced[2].

As mentioned in the self-healing SIG's session in Dublin[3], the OPNFV
community has already put a lot of effort into testing HA scenarios,
and it would be great if this work was shared across the whole
OpenStack community.  In particular they have a project called


which contains a bunch of HA test cases:


Currently each sub-community and vendor seems to be reinventing HA
testing by itself to some extent, which is easier to accomplish in the
short-term, but obviously less efficient in the long-term.  It would
be awesome if we could break these silos down and join efforts! :-)


[0] #openstack-ha on Freenode IRC
[1] https://wiki.openstack.org/wiki/Self-healing_SIG
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing
[3] https://etherpad.openstack.org/p/self-healing-ptg-rocky

More information about the OpenStack-dev mailing list