Open Stack

Sun Dec 16 23:24:42 UTC 2018

On 14/12/18 4:06 AM, Ignazio Cassano wrote:
> Hi All,
> I installed queens on centos 7.
> Heat seems to work fine with templates I wrote but when I create magnum 
> cluster
> I often face with db deadlock in heat-engine log:

The stacktrace below is in stack delete, so do you mean that the problem 
occurs when deleting a Magnum cluster, or can it occur any time with a 
magnum cluster?

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback (most 
> recent call last):
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 918, in 
> _action_recorder
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2035, 
> in delete
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     *action_args)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346, 
> in wrapper
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step = 
> next(subtask)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 977, in 
> action_handler_task
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done = 
> check(handler_data)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", 
> line 587, in check_delete_complete
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return 
> self._check_status_complete(self.DELETE)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File 
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", 
> line 454, in _check_status_complete
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     action=action)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource 
> ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213, 
> u'Deadlock found when trying to get lock; try restarting transaction') 
> (Background on this error at: http://sqlalche.me/e/2j85)
> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack 
> [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default] 
> Stack DELETE FAILED 
> (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw): 
> Resource DELETE failed: resources[0]: (pymysql.err.InternalError) (1213, 
> u'Deadlock found when trying to get lock; try restarting transaction') 
> (Background on this error at: http://sqlalche.me/e/2j85)
> 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource 
> [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default] 
> DELETE: ResourceGroup "swarm_primary_master" 
> [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack 
> "swarm-clustergp27-ebhsalhb4bop" [b43256fa-52e2-4613-ac15-63fd9340a8be]
> 
> 
> I read this issue was solved in heat version 10 but seems not.

The patch for bug 1732969, which is in Queens, is probably the fix 
you're referring to: https://review.openstack.org/#/c/521170/1

The traceback in that bug included the SQL statement that was failing. 
I'm not sure if that isn't included in your traceback because SQLAlchemy 
didn't report it, or if it's because that traceback is actually from the 
parent stack. If you have a traceback from the original failure in the 
child stack that would be useful. If there's a way to turn on more 
detailed reporting of errors in SQLAlchemy that would also be useful.

Since this is a delete, it's possible that we need a retry-on-deadlock 
on the resource_delete() function also (though resource_update() is 
actually used more during a delete to update the status).

- ZB

Open Stack

queens heat db deadlock

OpenStack

Community

Documentation

Branding & Legal