[Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found
Hi! We are currently troubleshooting a Heat stack issue where one of the stack (one of 25 or so) is failing to be created properly (seemingly randomly). The actual error returned by Heat is quite strange and Google has been quite sparse in terms of references. The actual error looks like the following (I've sanitized some of the names): Resource CREATE failed: resources.potato: Resource CREATE failed: resources[0]: raw template with id 22273 not found heat resource-list STACK_NAME_HERE -n 50
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name |
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | CREATE_FAILED | 2020-07-18 T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | CREATE_COMPLETE | 2020-07-21T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | 0 | | potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z | nested_stack_2_STACK_NAME_HERE | | 1 | | potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | nested_stack_2_STACK_NAME_HERE |
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
The template itself is pretty simple and attempts to create a ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that one the creation of those machines fails and Heat get's a little cooky and returns an error that might not be the actual root cause. I would have expected the VM to show up in the resource list but I just see the source "yaml". Has anyone seen something similar in the past? Thanks!
On 21/07/20 8:03 pm, Laurent Dumont wrote:
Hi!
We are currently troubleshooting a Heat stack issue where one of the stack (one of 25 or so) is failing to be created properly (seemingly randomly).
The actual error returned by Heat is quite strange and Google has been quite sparse in terms of references.
The actual error looks like the following (I've sanitized some of the names):
Resource CREATE failed: resources.potato: Resource CREATE failed: resources[0]: raw template with id 22273 not found
When creating a nested stack, rather than just calling the RPC method to create a new stack, Heat stores the template in the database first and passes the ID in the RPC message.[1] (It turns out that by doing it this way we can save massive amounts of memory when processing a large tree of nested stacks.) My best guess is that this message indicates that the template row has been deleted by the time the other engine goes to look at it. I don't see how you could have got an ID like 22273 without the template having been successfully stored at some point. The template is only supposed to be deleted if the RPC call returns with an error.[2] The only way I can think of for that to happen before an attempt to create the child stack is if the RPC call times out, but the original message is eventually picked up by an engine. I would check your logs for RPC timeouts and consider increasing them. What does the status_reason look like at one level above in the tree? That should indicate the first error that caused the template to be deleted.
heat resource-list STACK_NAME_HERE -n 50 +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | CREATE_FAILED | 2020-07-18 T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | CREATE_COMPLETE | 2020-07-21T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | 0 | | potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z | nested_stack_2_STACK_NAME_HERE | | 1 | | potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | nested_stack_2_STACK_NAME_HERE | +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
The template itself is pretty simple and attempts to create a ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that one the creation of those machines fails and Heat get's a little cooky and returns an error that might not be the actual root cause. I would have expected the VM to show up in the resource list but I just see the source "yaml".
It's clear from the above output that the scaled unit of the resource group is in fact a template (not an OS::Nova::Server), and the error is occurring trying to create a stack from that template (potato1.yaml) - before Heat even has a chance to start creating the server.
Has anyone seen something similar in the past?
Nope. cheers, Zane. [1] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... [2] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s...
Hey Zane, Thank you so much for the details - super interesting. We've worked with the Vendor to try and reproduce while we had our logs for Heat turned to DEBUG. Unfortunately, all of the creations they have attempted since have worked. It first failed 4 times out of 5 and has since worked... It's one of those problems! We'll keep trying to reproduce. Just to be sure, the actual yaml is stored in the DB and then accessed to create the actual Heat ressources? Thanks! On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter <zbitter@redhat.com> wrote:
On 21/07/20 8:03 pm, Laurent Dumont wrote:
Hi!
We are currently troubleshooting a Heat stack issue where one of the stack (one of 25 or so) is failing to be created properly (seemingly randomly).
The actual error returned by Heat is quite strange and Google has been quite sparse in terms of references.
The actual error looks like the following (I've sanitized some of the names):
Resource CREATE failed: resources.potato: Resource CREATE failed: resources[0]: raw template with id 22273 not found
When creating a nested stack, rather than just calling the RPC method to create a new stack, Heat stores the template in the database first and passes the ID in the RPC message.[1] (It turns out that by doing it this way we can save massive amounts of memory when processing a large tree of nested stacks.) My best guess is that this message indicates that the template row has been deleted by the time the other engine goes to look at it.
I don't see how you could have got an ID like 22273 without the template having been successfully stored at some point.
The template is only supposed to be deleted if the RPC call returns with an error.[2] The only way I can think of for that to happen before an attempt to create the child stack is if the RPC call times out, but the original message is eventually picked up by an engine. I would check your logs for RPC timeouts and consider increasing them.
What does the status_reason look like at one level above in the tree? That should indicate the first error that caused the template to be deleted.
heat resource-list STACK_NAME_HERE -n 50
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name |
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
| potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | CREATE_FAILED | 2020-07-18 T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | CREATE_COMPLETE | 2020-07-21T19:52:10Z | nested_stack_1_STACK_NAME_HERE | | 0 | | potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z | nested_stack_2_STACK_NAME_HERE | | 1 | | potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | nested_stack_2_STACK_NAME_HERE |
+------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+
The template itself is pretty simple and attempts to create a ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that one the creation of those machines fails and Heat get's a little cooky and returns an error that might not be the actual root cause. I would have expected the VM to show up in the resource list but I just see the source "yaml".
It's clear from the above output that the scaled unit of the resource group is in fact a template (not an OS::Nova::Server), and the error is occurring trying to create a stack from that template (potato1.yaml) - before Heat even has a chance to start creating the server.
Has anyone seen something similar in the past?
Nope.
cheers, Zane.
[1]
https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... [2]
https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s...
On 24/07/20 10:59 am, Laurent Dumont wrote:
Hey Zane,
Thank you so much for the details - super interesting. We've worked with the Vendor to try and reproduce while we had our logs for Heat turned to DEBUG. Unfortunately, all of the creations they have attempted since have worked. It first failed 4 times out of 5 and has since worked...
Interesting - sounds like a timing issue, but I haven't spotted any code that looks like it could fail by going too fast.
It's one of those problems! We'll keep trying to reproduce. Just to be sure, the actual yaml is stored in the DB and then accessed to create the actual Heat ressources?
Yep, correct. It's stored and the ID is passed in the RPC message here: https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... and then when the other engine receives the create_stack RPC message it uses the stored template instead of one passed in the message like you would get from a create call initiated via the ReST API: https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#... https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#... - ZB
Thanks!
On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter <zbitter@redhat.com <mailto:zbitter@redhat.com>> wrote:
On 21/07/20 8:03 pm, Laurent Dumont wrote: > Hi! > > We are currently troubleshooting a Heat stack issue where one of the > stack (one of 25 or so) is failing to be created properly (seemingly > randomly). > > The actual error returned by Heat is quite strange and Google has been > quite sparse in terms of references. > > The actual error looks like the following (I've sanitized some of the > names): > > Resource CREATE failed: resources.potato: Resource CREATE failed: > resources[0]: raw template with id 22273 not found
When creating a nested stack, rather than just calling the RPC method to create a new stack, Heat stores the template in the database first and passes the ID in the RPC message.[1] (It turns out that by doing it this way we can save massive amounts of memory when processing a large tree of nested stacks.) My best guess is that this message indicates that the template row has been deleted by the time the other engine goes to look at it.
I don't see how you could have got an ID like 22273 without the template having been successfully stored at some point.
The template is only supposed to be deleted if the RPC call returns with an error.[2] The only way I can think of for that to happen before an attempt to create the child stack is if the RPC call times out, but the original message is eventually picked up by an engine. I would check your logs for RPC timeouts and consider increasing them.
What does the status_reason look like at one level above in the tree? That should indicate the first error that caused the template to be deleted.
> heat resource-list STACK_NAME_HERE -n 50 > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | resource_name | physical_resource_id | > resource_type | resource_status | updated_time | > stack_name > | > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | > CREATE_FAILED | 2020-07-18 T19:52:10Z | > nested_stack_1_STACK_NAME_HERE | > | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | > CREATE_COMPLETE | 2020-07-21T19:52:10Z | > nested_stack_1_STACK_NAME_HERE | > | 0 | | > potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z | > nested_stack_2_STACK_NAME_HERE | > | 1 | | > potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | > nested_stack_2_STACK_NAME_HERE | > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > > The template itself is pretty simple and attempts to create a > ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that > one the creation of those machines fails and Heat get's a little cooky > and returns an error that might not be the actual root cause. I would > have expected the VM to show up in the resource list but I just see the > source "yaml".
It's clear from the above output that the scaled unit of the resource group is in fact a template (not an OS::Nova::Server), and the error is occurring trying to create a stack from that template (potato1.yaml) - before Heat even has a chance to start creating the server.
> Has anyone seen something similar in the past?
Nope.
cheers, Zane.
[1] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s... [2] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/s...
participants (2)
-
Laurent Dumont
-
Zane Bitter