[openstack-dev] [tripleo] How to report tripleo-quickstart results to DLRN API
Attila Darazs
adarazs at redhat.com
Tue Aug 22 15:22:38 UTC 2017
Hi folks,
I'm trying to come up with a good design for $subject, and there are
several different methods with pros and cons. I'd like to get your
opinion about them.
For a bit of context, DLRN API[1] is a new extension of DLRN, our
package and repo building solution for RDO. It's designed to be a
central point of information about jobs that ran on certain hashes in
various stages of testing and handle "promotions", which are really just
symlinks to certain hashes.
We want to report back job results on multiple levels (upstream, RDO CI
phase1 & phase2) and then use the information to promote new hashes at
every stage.
If we would only be interested in reporting successful runs, the
solution is fairly simple: add a reporting step to the
quickstart-extras.yml[2] playbook at the end if a "report" variable is set.
However it would be probably useful in the long term to also report back
failures (for statistics) and that's where things get complicated.
It would be great if we could report the failed status within the same
quickstart.sh run instead of having a second run, because this way we
don't have to touch the shell scripts in multiple places (upstream,
phase1, phase2), just get the reporting done with config file changes.
This is not simple, because the Ansible play can exit at any failed
task. We would need to wrap each task in rescue blocks[3] to avoid
skipping the failure.
Idea #1: Create a "run successful" marker file at the reporting step,
and report failure in case the file is not found (also making sure the
file does not exist at the start of the run). This would still require
multiple run of ansible-playbook, but we could integrate the
functionality into quickstart.sh by creating a --report option, making
it available at every environment at the same time.
Idea #2: Don't fail on *any* step, just register variables and check for
success. An example where we already do this is the overcloud-deploy
role. We don't fail on errors[4], but write out a file with the result
and fail later[5]. We would need to do this at almost all shell parts to
be reasonably certain we won't miss any failure. This requires a lot of
alterations to playbooks and it seems a bit forced on Ansible without
the usage of the rescue block, which we can't put in every single task.
Idea #3: Use "post-build scripts" in the promotion jobs. We can pattern
match for failed/passed jobs and report the result accordingly. The
problem with this is that it's environment dependent. While we can
certainly do this with post-build scripts in Jenkins Job Builder on
CentOS CI, it's not clear how to solve this in Zuul queues. Probably we
just need to make the shell scripts of the jobs more involved (not fail
on quickstart.sh's nonzero exit). Besides these complications, it also
means that we have to keep the reporting method in sync across multiple
environments.
Neither of these solutions are ideal, let me know if you have any better
design idea. I personally think #1 might be the easiest and cleanest to
implement, especially that I'm planning to introduce multiple
ansible-playbook runs in quickstart.sh during the redesign of the devmode.
Best regards,
Attila
[1] https://github.com/javierpena/dlrnapi_client
[2]
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras.yml
[3]
http://docs.ansible.com/ansible/latest/playbooks_blocks.html#error-handling
[4]
https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-deploy/tasks/deploy-overcloud.yml#L6
[5]
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/quickstart-extras-overcloud.yml#L32-L44
More information about the OpenStack-dev
mailing list