[openstack-dev] [tripleo] service validation during deployment steps
Emilien Macchi
emilien at redhat.com
Fri Jul 15 15:31:35 UTC 2016
Hi,
Some people on the field brought interesting feedback:
"As a TripleO User, I would like the deployment to stop immediately
after an resource creation failure during a step of the deployment and
be able to easily understand what service or resource failed to be
installed".
Example:
If during step4 Puppet tries to deploy Neutron and OVS, but OVS fails
to start for some reasons, deployment should stop at the end of the
step.
So there are 2 things in this user story:
1) Be able to run some service validation within a step deployment.
Note about the implementation: make the validation composable per
service (OVS, nova, etc) and not per role (compute, controller, etc).
2) Make this information readable and easy to access and understand
for our users.
I have a proof-of-concept for 1) and partially 2), with the example of
OVS: https://review.openstack.org/#/c/342202/
This patch will make sure OVS is actually usable at step 4 by running
'ovs-vsctl show' during the Puppet catalog and if it's working, it
will create a Puppet anchor. This anchor is currently not useful but
could be in future if we want to rely on it for orchestration.
I wrote the service validation in Puppet 2 years ago when doing Spinal
Stack with eNovance:
https://github.com/openstack/puppet-openstacklib/blob/master/manifests/service_validation.pp
I think we could re-use it very easily, it has been proven to work.
Also, the code is within our Puppet profiles, so it's by design
composable and we don't need to make any connection with our current
services with some magic. Validation will reside within Puppet
manifests.
If you look my PoC, this code could even live in puppet-vswitch itself
(we already have this code for puppet-nova, and some others).
Ok now, what if validation fails?
I'm testing it here: https://review.openstack.org/#/c/342205/
If you look at /var/log/messages, you'll see:
Error: /Stage[main]/Tripleo::Profile::Base::Neutron::Ovs/Openstacklib::Service_validation[openvswitch]/Exec[execute
openvswitch validation]/returns: change from notrun to 0 failed
So it's pretty clear by looking at logs that openvswitch service
validation failed and something is wrong. You'll also notice in the
logs that deployed stopped at step 4 since OVS is not considered to
run.
It's partially addressing 2) because we need to make it more explicit
and readable. Dan Prince had the idea to use
https://github.com/ripienaar/puppet-reportprint to print a nice report
of Puppet catalog result (we haven't tried it yet). We could also use
Operational Tools later to monitor Puppet logs and find Service
validation failures.
So this email is a bootstrap of discussion, it's open for feedback.
Don't take my PoC as something we'll implement. It's an idea and I
think it's worth to look at it.
I like it for 2 reasons:
- the validation code reside within our profiles, so it's composable by design.
- it's flexible and allow us to test everything. It can be a bash
script, a shell command, a Puppet resource (provider, service, etc).
Thanks for reading so far,
--
Emilien Macchi
More information about the OpenStack-dev
mailing list