<div dir="ltr"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Oct 12, 2018 at 2:10 PM Sofer Athlan-Guyot <<a href="mailto:sathlang@redhat.com">sathlang@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

Testing and maintaining a green status for upgrade jobs within the 3h<br>

time limit has proven to be a very difficult job to say the least.<br>

<br>

The net result has been: we don't have anything even touching the<br>

upgrade code in the CI.<br>

<br>

So during Denver PTG it has been decided to give up on running a full<br>

upgrade job during the 3h time limit and instead to focus on two<br>

complementary approach to at least touch the upgrade code:<br>

 1. run a standalone upgrade: this test the ansible upgrade playbook;<br>

 2. run a N->N upgrade; this test the upgrade python code;<br>

<br>

And here there are, still not merged but seen working:<br>

 - tripleo-ci-centos-7-standalone-upgrade:<br>

   <a href="https://review.openstack.org/#/c/604706/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/604706/</a><br>

 - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:<br>

   <a href="https://review.openstack.org/#/c/607848/9" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/607848/9</a><br>

<br>

The first is good to merge (but other could disagree), the second could<br>

be as well (but I tend to disagree :))<br>

<br>

The first leverage the standalone deployment and execute an standalone<br>

upgrade just after it.<br>

<br>

The limitation is that it only tests non-HA services (sorry pidone,<br>

cannot test ha in standalone) and only the upgrade_tasks (ie not any<br>

workflow related to the upgrade cli)<br>

<br>

The main benefits here are:<br>

 - ~2h to run the upgrade, still a bit long but far away from the 3h<br>

   time limit;<br>

 - we trigger a yum upgrade so that we can catch problems there as well;<br>

 - we test the standalone upgrade which is good in itself;<br>

 - composable role available (as in standalone/all-in-all deployment) so<br>

   you can make a specific upgrade test for your project if it fits into<br>

   the standalone constraint;<br>

<br>

For this last point, if standalone specific role eventually goes into<br>

project testing (nova, neutron ...), they could have as well a way to<br>

test upgrade tasks.  This would be a best case scenario.<br>

<br>

Now, for the second point, the N->N upgrade.  Its "limitation" is that<br>

... well it doesn't run a yum upgrade at all.  We start from master and<br>

run the upgrade to master.<br>

<br>

It's main benefit are:<br>

 - it takes ~2h20 to run, so well under the 3h time;<br>

 - tripleoclient upgrade code is run, which is one thing that the<br>

   standalone ugprade cannot do.<br>

 - It also tend to exercise idempotency of all the tasks as it runs them<br>

   on an already "upgraded" node;<br>

 - As added bonus, it could gate the tripleo-upgrade role as well as it<br>

   definitively loads all of the role's tasks[1]<br>

<br>

For those that stayed with me to this point, I'm throwing another CI<br>

test that already proved useful already (caught errors), it's the<br>

ansible-lint test.  After a standalone deployment we just run<br>

ansible-lint on all playbook generated[2].<br>

<br>

It produces standalone_ansible_lint.log[3] in the working directory. It<br>

only takes a couple of minute to install ansible-lint and run it. It<br>

definitively gate against typos and the like. It touches hard to<br>

reach code as well, for instance the fast_forward tasks are linted.<br>

Still no pidone tasks in there but it could easily be added to a job<br>

that has HA tasks generated.<br>

<br>

Note that by default ansible-lint barks, as the generated playbooks hit<br>

several lintage problems, so only syntax errors and misnamed tasks or<br>

parameters are currently activated.  But all the lint problems are<br>

logged in the above file and can be fixed later on.  At which point we<br>

could activate full lint gating.<br>

<br>

Thanks for this long reading, any comments, shout of victory, cry of<br>

despair and reviews are welcomed.<br></blockquote><div><br></div><div>That's awesome. It's perfect for a project we are working on (Tobiko) where we want to run tests before upgrade (setting up resources) and after (verifying those resources are still available).</div><div><br></div><div>I want to add such job (upgrade standalone) and I need help:</div><div><br></div><div><a href="https://review.openstack.org/#/c/610397/">https://review.openstack.org/#/c/610397/</a><br></div><div><br></div><div>How do I set  tempest regex for pre-upgrade and another one for post upgrade?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

[1] but this has still to be investigated.<br>

[2] testing review <a href="https://review.openstack.org/#/c/604756/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/604756/</a> and main code <a href="https://review.openstack.org/#/c/604757/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/604757/</a><br>

[3] sample output <a href="http://paste.openstack.org/show/731960/" rel="noreferrer" target="_blank">http://paste.openstack.org/show/731960/</a><br>

--<br>

Sofer Athlan-Guyot<br>

chem on #freenode<br>

Upgrade DFG.<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div></div>