[Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found
Zane Bitter
zbitter at redhat.com
Fri Aug 14 00:37:36 UTC 2020
On 24/07/20 10:59 am, Laurent Dumont wrote:
> Hey Zane,
>
> Thank you so much for the details - super interesting. We've worked with
> the Vendor to try and reproduce while we had our logs for Heat turned to
> DEBUG. Unfortunately, all of the creations they have attempted since
> have worked. It first failed 4 times out of 5 and has since worked...
Interesting - sounds like a timing issue, but I haven't spotted any code
that looks like it could fail by going too fast.
> It's one of those problems! We'll keep trying to reproduce. Just to be
> sure, the actual yaml is stored in the DB and then accessed to create
> the actual Heat ressources?
Yep, correct. It's stored and the ID is passed in the RPC message here:
https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L308
https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L372-L374
https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L336-L337
and then when the other engine receives the create_stack RPC message it
uses the stored template instead of one passed in the message like you
would get from a create call initiated via the ReST API:
https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#L847-L851
https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#L731-L732
- ZB
>
> Thanks!
>
> On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter <zbitter at redhat.com
> <mailto:zbitter at redhat.com>> wrote:
>
> On 21/07/20 8:03 pm, Laurent Dumont wrote:
> > Hi!
> >
> > We are currently troubleshooting a Heat stack issue where one of the
> > stack (one of 25 or so) is failing to be created properly (seemingly
> > randomly).
> >
> > The actual error returned by Heat is quite strange and Google has
> been
> > quite sparse in terms of references.
> >
> > The actual error looks like the following (I've sanitized some of
> the
> > names):
> >
> > Resource CREATE failed: resources.potato: Resource CREATE failed:
> > resources[0]: raw template with id 22273 not found
>
> When creating a nested stack, rather than just calling the RPC
> method to
> create a new stack, Heat stores the template in the database first and
> passes the ID in the RPC message.[1] (It turns out that by doing it
> this
> way we can save massive amounts of memory when processing a large tree
> of nested stacks.) My best guess is that this message indicates that
> the
> template row has been deleted by the time the other engine goes to look
> at it.
>
> I don't see how you could have got an ID like 22273 without the
> template
> having been successfully stored at some point.
>
> The template is only supposed to be deleted if the RPC call returns
> with
> an error.[2] The only way I can think of for that to happen before an
> attempt to create the child stack is if the RPC call times out, but the
> original message is eventually picked up by an engine. I would check
> your logs for RPC timeouts and consider increasing them.
>
> What does the status_reason look like at one level above in the tree?
> That should indicate the first error that caused the template to be
> deleted.
>
> > heat resource-list STACK_NAME_HERE -n 50
> >
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
> > | resource_name | physical_resource_id |
> > resource_type | resource_status | updated_time
> |
> > stack_name
> > |
> >
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
> > | potato | RESOURCE_ID_HERE |
> OS::Heat::ResourceGroup |
> > CREATE_FAILED | 2020-07-18 T19:52:10Z |
> > nested_stack_1_STACK_NAME_HERE |
> > | potato_server_group | RESOURCE_ID_HERE |
> OS::Nova::ServerGroup |
> > CREATE_COMPLETE | 2020-07-21T19:52:10Z |
> > nested_stack_1_STACK_NAME_HERE |
> > | 0 | |
> > potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z |
> > nested_stack_2_STACK_NAME_HERE |
> > | 1 | |
> > potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z |
> > nested_stack_2_STACK_NAME_HERE |
> >
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
> >
> >
> > The template itself is pretty simple and attempts to create a
> > ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling
> is that
> > one the creation of those machines fails and Heat get's a little
> cooky
> > and returns an error that might not be the actual root cause. I
> would
> > have expected the VM to show up in the resource list but I just
> see the
> > source "yaml".
>
> It's clear from the above output that the scaled unit of the resource
> group is in fact a template (not an OS::Nova::Server), and the error is
> occurring trying to create a stack from that template (potato1.yaml) -
> before Heat even has a chance to start creating the server.
>
> > Has anyone seen something similar in the past?
>
> Nope.
>
> cheers,
> Zane.
>
> [1]
> https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384
> [2]
> https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342
>
>
More information about the openstack-discuss
mailing list