<div dir="auto">Hello Zane, it happens also when I create a magnum cluster .<div dir="auto">My mariadb cluster is behind haproxy. </div><div dir="auto">Do you need any other info?</div><div dir="auto">Thanks </div><div dir="auto">Ignazio</div><div dir="auto"><br></div></div><br><div class="gmail_quote"><div dir="ltr">Il giorno Lun 17 Dic 2018 00:28 Zane Bitter <<a href="mailto:zbitter@redhat.com">zbitter@redhat.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 14/12/18 4:06 AM, Ignazio Cassano wrote:<br>

> Hi All,<br>

> I installed queens on centos 7.<br>

> Heat seems to work fine with templates I wrote but when I create magnum <br>

> cluster<br>

> I often face with db deadlock in heat-engine log:<br>

<br>

The stacktrace below is in stack delete, so do you mean that the problem <br>

occurs when deleting a Magnum cluster, or can it occur any time with a <br>

magnum cluster?<br>

<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback (most <br>

> recent call last):<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 918, in <br>

> _action_recorder<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2035, <br>

> in delete<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     *action_args)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346, <br>

> in wrapper<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step = <br>

> next(subtask)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 977, in <br>

> action_handler_task<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done = <br>

> check(handler_data)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", <br>

> line 587, in check_delete_complete<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return <br>

> self._check_status_complete(self.DELETE)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File <br>

> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", <br>

> line 454, in _check_status_complete<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     action=action)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource <br>

> ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213, <br>

> u'Deadlock found when trying to get lock; try restarting transaction') <br>

> (Background on this error at: <a href="http://sqlalche.me/e/2j85" rel="noreferrer noreferrer" target="_blank">http://sqlalche.me/e/2j85</a>)<br>

> 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource<br>

> 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack <br>

> [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default] <br>

> Stack DELETE FAILED <br>

> (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw): <br>

> Resource DELETE failed: resources[0]: (pymysql.err.InternalError) (1213, <br>

> u'Deadlock found when trying to get lock; try restarting transaction') <br>

> (Background on this error at: <a href="http://sqlalche.me/e/2j85" rel="noreferrer noreferrer" target="_blank">http://sqlalche.me/e/2j85</a>)<br>

> 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource <br>

> [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default] <br>

> DELETE: ResourceGroup "swarm_primary_master" <br>

> [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack <br>

> "swarm-clustergp27-ebhsalhb4bop" [b43256fa-52e2-4613-ac15-63fd9340a8be]<br>

> <br>

> <br>

> I read this issue was solved in heat version 10 but seems not.<br>

<br>

The patch for bug 1732969, which is in Queens, is probably the fix <br>

you're referring to: <a href="https://review.openstack.org/#/c/521170/1" rel="noreferrer noreferrer" target="_blank">https://review.openstack.org/#/c/521170/1</a><br>

<br>

The traceback in that bug included the SQL statement that was failing. <br>

I'm not sure if that isn't included in your traceback because SQLAlchemy <br>

didn't report it, or if it's because that traceback is actually from the <br>

parent stack. If you have a traceback from the original failure in the <br>

child stack that would be useful. If there's a way to turn on more <br>

detailed reporting of errors in SQLAlchemy that would also be useful.<br>

<br>

Since this is a delete, it's possible that we need a retry-on-deadlock <br>

on the resource_delete() function also (though resource_update() is <br>

actually used more during a delete to update the status).<br>

<br>

- ZB<br>

<br>

</blockquote></div>