[openstack-dev] [TripleO][CI] Validating HA on upstream

Bogdan Dobrelya bdobreli at redhat.com
Fri Feb 16 09:24:01 UTC 2018


On 2/15/18 8:22 PM, Raoul Scarazzini wrote:
> TL;DR: we would like to change the way HA is tested upstream to avoid
> being hitten by evitable bugs that the CI process should discover.
> 
> Long version:
> 
> Today HA testing in upstream consist only in verifying that a three
> controllers setup comes up correctly and can spawn an instance. That's
> something, but it’s far from being enough since we continuously see "day
> two" bugs.
> We started covering this more than a year ago in internal CI and today
> also on rdocloud using a project named tripleo-quickstart-utils [1].
> Apart from his name, the project is not limited to tripleo-quickstart,
> it covers three principal roles:
> 
> 1 - stonith-config: a playbook that can be used to automate the creation
> of fencing devices in the overcloud;
> 2 - instance-ha: a playbook that automates the seventeen manual steps
> needed to configure instance HA in the overcloud, test them via rally
> and verify that instance HA works;
> 3 - validate-ha: a playbook that runs a series of disruptive actions in
> the overcloud and verifies it always behaves correctly by deploying a
> heat-template that involves all the overcloud components;
> 
> To make this usable upstream, we need to understand where to put this
> code. Here some choices:
> 
> 1 - tripleo-validations: the most logical place to put this, at least
> looking at the name, would be tripleo-validations. I've talked with some
> of the folks working on it, and it came out that the meaning of
> tripleo-validations project is not doing disruptive tests. Integrating
> this stuff would be out of scope.
> 
> 2 - tripleo-quickstart-extras: apart from the fact that this is not
> something meant just for quickstart (the project supports infrared and
> "plain" environments as well) even if we initially started there, in the
> end, it came out that nobody was looking at the patches since nobody was
> able to verify them. The result was a series of reviews stuck forever.
> So moving back to extras would be a step backward.
> 
> 3 - Dedicated project (tripleo-ha-utils or just tripleo-utils): like for
> tripleo-upgrades or tripleo-validations it would be perfect having all
> this grouped and usable as a standalone thing. Any integration is
> possible inside the playbook for whatever kind of test. Today we're

+1 this looks like a perfect fit. Would it be possible to install that 
tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside 
the quickstart, then apply destructive-testing playbooks with either the 
quickstart's static inventory [0] (from your admin/control node) or 
maybe via dynamic inventory [1] (from undercloud managing the overcloud 
under test via config-download and/or external ansible deployment 
mechanisms)?

[0] 
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
[1] 
https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory

> using the bash framework to interact with the cluster, rally to test
> instance-ha and Ansible itself to simulate full power outage scenarios.
> 
> There's been a lot of talk about this during the last PTG [2], and
> unfortunately, I'll not be part of the next one, but I would like to see
> things moving on this side.
> Everything I wrote is of course up to discussion, that's precisely the
> meaning of this mail.
> 
> Thanks to all who'll give advice, suggestions, and thoughts about all
> this stuff.
> 
> [1] https://github.com/redhat-openstack/tripleo-quickstart-utils
> [2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list