[openstack-dev] [TripleO/heat] openstack debug command
Steven Hardy
shardy at redhat.com
Mon Nov 30 10:21:07 UTC 2015
On Mon, Nov 30, 2015 at 10:03:29AM +0100, Lennart Regebro wrote:
> I'm tasked to implement a command that shows error messages when a
> deployment has failed. I have a vague memory of having seen scripts
> that do something like this, if that exists, can somebody point me in
> teh right direction?
I wrote a super simple script and put it in a blog post a while back:
http://hardysteven.blogspot.co.uk/2015/05/tripleo-heat-templates-part-3-cluster.html
All it does is find the failed SoftwareDeployment resources, then do heat
deployment-show on the resource, so you can see the stderr associated with
the failure.
Having tripleoclient do that by default would be useful.
> Any opinions on what that should do, specifically? Traverse failed
> resources to find error messages, I assume. Anything else?
Yeah, but I think for this to be useful, we need to go a bit deeper than
just showing the resource error - there are a number of typical failure
modes, and I end up repeating the same steps to debug every time.
1. SoftwareDeployment failed (mentioned above). Every time, you need to
see the name of the SoftwareDeployment which failed, figure out if it
failed on one or all of the servers, then look at the stderr for clues.
2. A server failed to build (OS::Nova::Server resource is FAILED), here we
need to check both nova and ironic, looking first to see if ironic has the
node(s) in the wrong state for scheduling (e.g nova gave us a no valid
host error), and then if they are OK in ironic, do nova show on the failed
host to see the reason nova gives us for it failing to go ACTIVE.
3. A stack timeout happened. IIRC when this happens, we currently fail
with an obscure keystone related backtrace due to the token expiring. We
should instead catch this error and show the heat stack status_reason,
which should say clearly the stack timed out.
If we could just make these three cases really clear and easy to debug, I
think things would be much better (IME the above are a high proportion of
all failures), but I'm sure folks can come up with other ideas to add to
the list.
Steve
More information about the OpenStack-dev
mailing list