<div dir="ltr">Hey Zane,<div><br></div><div>Thank you so much for the details - super interesting. We've worked with the Vendor to try and reproduce while we had our logs for Heat turned to DEBUG. Unfortunately, all of the creations they have attempted since have worked. It first failed 4 times out of 5 and has since worked...</div><div><br></div><div>It's one of those problems! We'll keep trying to reproduce. Just to be sure, the actual yaml is stored in the DB and then accessed to create the actual Heat ressources? </div><div><br></div><div>Thanks!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter <<a href="mailto:zbitter@redhat.com">zbitter@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 21/07/20 8:03 pm, Laurent Dumont wrote:<br>
> Hi!<br>
> <br>
> We are currently troubleshooting a Heat stack issue where one of the <br>
> stack (one of 25 or so) is failing to be created properly (seemingly <br>
> randomly).<br>
> <br>
> The actual error returned by Heat is quite strange and Google has been <br>
> quite sparse in terms of references.<br>
> <br>
> The actual error looks like the following (I've sanitized some of the <br>
> names):<br>
> <br>
> Resource CREATE failed: resources.potato: Resource CREATE failed: <br>
> resources[0]: raw template with id 22273 not found<br>
<br>
When creating a nested stack, rather than just calling the RPC method to <br>
create a new stack, Heat stores the template in the database first and <br>
passes the ID in the RPC message.[1] (It turns out that by doing it this <br>
way we can save massive amounts of memory when processing a large tree <br>
of nested stacks.) My best guess is that this message indicates that the <br>
template row has been deleted by the time the other engine goes to look <br>
at it.<br>
<br>
I don't see how you could have got an ID like 22273 without the template <br>
having been successfully stored at some point.<br>
<br>
The template is only supposed to be deleted if the RPC call returns with <br>
an error.[2] The only way I can think of for that to happen before an <br>
attempt to create the child stack is if the RPC call times out, but the <br>
original message is eventually picked up by an engine. I would check <br>
your logs for RPC timeouts and consider increasing them.<br>
<br>
What does the status_reason look like at one level above in the tree? <br>
That should indicate the first error that caused the template to be deleted.<br>
<br>
> heat resource-list STACK_NAME_HERE -n 50<br>
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+<br>
> | resource_name | physical_resource_id |<br>
> resource_type | resource_status | updated_time |<br>
> stack_name <br>
> |<br>
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+<br>
> | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup |<br>
> CREATE_FAILED | 2020-07-18 T19:52:10Z |<br>
> nested_stack_1_STACK_NAME_HERE |<br>
> | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup |<br>
> CREATE_COMPLETE | 2020-07-21T19:52:10Z |<br>
> nested_stack_1_STACK_NAME_HERE |<br>
> | 0 | |<br>
> potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z |<br>
> nested_stack_2_STACK_NAME_HERE |<br>
> | 1 | |<br>
> potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z |<br>
> nested_stack_2_STACK_NAME_HERE |<br>
> +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+<br>
> <br>
> <br>
> The template itself is pretty simple and attempts to create a <br>
> ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that <br>
> one the creation of those machines fails and Heat get's a little cooky <br>
> and returns an error that might not be the actual root cause. I would <br>
> have expected the VM to show up in the resource list but I just see the <br>
> source "yaml".<br>
<br>
It's clear from the above output that the scaled unit of the resource <br>
group is in fact a template (not an OS::Nova::Server), and the error is <br>
occurring trying to create a stack from that template (potato1.yaml) - <br>
before Heat even has a chance to start creating the server.<br>
<br>
> Has anyone seen something similar in the past?<br>
<br>
Nope.<br>
<br>
cheers,<br>
Zane.<br>
<br>
[1] <br>
<a href="https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384" rel="noreferrer" target="_blank">https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384</a><br>
[2] <br>
<a href="https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342" rel="noreferrer" target="_blank">https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342</a><br>
<br>
<br>
</blockquote></div>