Open Stack

Fri Oct 12 11:09:57 UTC 2018

Hi,

Testing and maintaining a green status for upgrade jobs within the 3h
time limit has proven to be a very difficult job to say the least.

The net result has been: we don't have anything even touching the
upgrade code in the CI.

So during Denver PTG it has been decided to give up on running a full
upgrade job during the 3h time limit and instead to focus on two
complementary approach to at least touch the upgrade code:
 1. run a standalone upgrade: this test the ansible upgrade playbook;
 2. run a N->N upgrade; this test the upgrade python code;

And here there are, still not merged but seen working:
 - tripleo-ci-centos-7-standalone-upgrade:
   https://review.openstack.org/#/c/604706/
 - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
   https://review.openstack.org/#/c/607848/9

The first is good to merge (but other could disagree), the second could
be as well (but I tend to disagree :))

The first leverage the standalone deployment and execute an standalone
upgrade just after it.

The limitation is that it only tests non-HA services (sorry pidone,
cannot test ha in standalone) and only the upgrade_tasks (ie not any
workflow related to the upgrade cli)

The main benefits here are:
 - ~2h to run the upgrade, still a bit long but far away from the 3h
   time limit;
 - we trigger a yum upgrade so that we can catch problems there as well;
 - we test the standalone upgrade which is good in itself;
 - composable role available (as in standalone/all-in-all deployment) so
   you can make a specific upgrade test for your project if it fits into
   the standalone constraint;

For this last point, if standalone specific role eventually goes into
project testing (nova, neutron ...), they could have as well a way to
test upgrade tasks.  This would be a best case scenario.

Now, for the second point, the N->N upgrade.  Its "limitation" is that
... well it doesn't run a yum upgrade at all.  We start from master and
run the upgrade to master.

It's main benefit are:
 - it takes ~2h20 to run, so well under the 3h time;
 - tripleoclient upgrade code is run, which is one thing that the
   standalone ugprade cannot do.
 - It also tend to exercise idempotency of all the tasks as it runs them
   on an already "upgraded" node;
 - As added bonus, it could gate the tripleo-upgrade role as well as it
   definitively loads all of the role's tasks[1]

For those that stayed with me to this point, I'm throwing another CI
test that already proved useful already (caught errors), it's the
ansible-lint test.  After a standalone deployment we just run
ansible-lint on all playbook generated[2].

It produces standalone_ansible_lint.log[3] in the working directory. It
only takes a couple of minute to install ansible-lint and run it. It
definitively gate against typos and the like. It touches hard to
reach code as well, for instance the fast_forward tasks are linted.
Still no pidone tasks in there but it could easily be added to a job
that has HA tasks generated.

Note that by default ansible-lint barks, as the generated playbooks hit
several lintage problems, so only syntax errors and misnamed tasks or
parameters are currently activated.  But all the lint problems are
logged in the above file and can be fixed later on.  At which point we
could activate full lint gating.

Thanks for this long reading, any comments, shout of victory, cry of
despair and reviews are welcomed.

[1] but this has still to be investigated.
[2] testing review https://review.openstack.org/#/c/604756/ and main code https://review.openstack.org/#/c/604757/
[3] sample output http://paste.openstack.org/show/731960/
--
Sofer Athlan-Guyot
chem on #freenode
Upgrade DFG.

Open Stack

[openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

OpenStack

Community

Documentation

Branding & Legal