Open Stack

Tue Aug 22 15:22:38 UTC 2017

Hi folks,

I'm trying to come up with a good design for $subject, and there are 
several different methods with pros and cons. I'd like to get your 
opinion about them.

For a bit of context, DLRN API[1] is a new extension of DLRN, our 
package and repo building solution for RDO. It's designed to be a 
central point of information about jobs that ran on certain hashes in 
various stages of testing and handle "promotions", which are really just 
symlinks to certain hashes.

We want to report back job results on multiple levels (upstream, RDO CI 
phase1 & phase2) and then use the information to promote new hashes at 
every stage.

If we would only be interested in reporting successful runs, the 
solution is fairly simple: add a reporting step to the 
quickstart-extras.yml[2] playbook at the end if a "report" variable is set.

However it would be probably useful in the long term to also report back 
failures (for statistics) and that's where things get complicated.

It would be great if we could report the failed status within the same 
quickstart.sh run instead of having a second run, because this way we 
don't have to touch the shell scripts in multiple places (upstream, 
phase1, phase2), just get the reporting done with config file changes.

This is not simple, because the Ansible play can exit at any failed 
task. We would need to wrap each task in rescue blocks[3] to avoid 
skipping the failure.

Idea #1: Create a "run successful" marker file at the reporting step, 
and report failure in case the file is not found (also making sure the 
file does not exist at the start of the run). This would still require 
multiple run of ansible-playbook, but we could integrate the 
functionality into quickstart.sh by creating a --report option, making 
it available at every environment at the same time.

Idea #2: Don't fail on *any* step, just register variables and check for 
success. An example where we already do this is the overcloud-deploy 
role. We don't fail on errors[4], but write out a file with the result 
and fail later[5]. We would need to do this at almost all shell parts to 
be reasonably certain we won't miss any failure. This requires a lot of 
alterations to playbooks and it seems a bit forced on Ansible without 
the usage of the rescue block, which we can't put in every single task.

Idea #3: Use "post-build scripts" in the promotion jobs. We can pattern 
match for failed/passed jobs and report the result accordingly. The 
problem with this is that it's environment dependent. While we can 
certainly do this with post-build scripts in Jenkins Job Builder on 
CentOS CI, it's not clear how to solve this in Zuul queues. Probably we 
just need to make the shell scripts of the jobs more involved (not fail 
on quickstart.sh's nonzero exit). Besides these complications, it also 
means that we have to keep the reporting method in sync across multiple 
environments.

Neither of these solutions are ideal, let me know if you have any better 
design idea. I personally think #1 might be the easiest and cleanest to 
implement, especially that I'm planning to introduce multiple 
ansible-playbook runs in quickstart.sh during the redesign of the devmode.

Best regards,
Attila

[1] https://github.com/javierpena/dlrnapi_client
[2] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras.yml
[3] 
http://docs.ansible.com/ansible/latest/playbooks_blocks.html#error-handling
[4] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-deploy/tasks/deploy-overcloud.yml#L6
[5] 
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras-overcloud.yml#L32-L44

Open Stack

[openstack-dev] [tripleo] How to report tripleo-quickstart results to DLRN API

OpenStack

Community

Documentation

Branding & Legal