queens heat db deadlock

Ignazio Cassano ignaziocassano at gmail.com
Mon Dec 17 05:44:36 UTC 2018


Hello Zane, it happens also when I create a magnum cluster .
My mariadb cluster is behind haproxy.
Do you need any other info?
Thanks
Ignazio


Il giorno Lun 17 Dic 2018 00:28 Zane Bitter <zbitter at redhat.com> ha scritto:

> On 14/12/18 4:06 AM, Ignazio Cassano wrote:
> > Hi All,
> > I installed queens on centos 7.
> > Heat seems to work fine with templates I wrote but when I create magnum
> > cluster
> > I often face with db deadlock in heat-engine log:
>
> The stacktrace below is in stack delete, so do you mean that the problem
> occurs when deleting a Magnum cluster, or can it occur any time with a
> magnum cluster?
>
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback (most
> > recent call last):
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 918, in
> > _action_recorder
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2035,
> > in delete
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> *action_args)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> > "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346,
> > in wrapper
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step =
> > next(subtask)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 977, in
> > action_handler_task
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done =
> > check(handler_data)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
> > line 587, in check_delete_complete
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return
> > self._check_status_complete(self.DELETE)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
> > line 454, in _check_status_complete
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> action=action)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> > ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213,
> > u'Deadlock found when trying to get lock; try restarting transaction')
> > (Background on this error at: http://sqlalche.me/e/2j85)
> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> > 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack
> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
> > Stack DELETE FAILED
> > (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw):
> > Resource DELETE failed: resources[0]: (pymysql.err.InternalError) (1213,
> > u'Deadlock found when trying to get lock; try restarting transaction')
> > (Background on this error at: http://sqlalche.me/e/2j85)
> > 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource
> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
> > DELETE: ResourceGroup "swarm_primary_master"
> > [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack
> > "swarm-clustergp27-ebhsalhb4bop" [b43256fa-52e2-4613-ac15-63fd9340a8be]
> >
> >
> > I read this issue was solved in heat version 10 but seems not.
>
> The patch for bug 1732969, which is in Queens, is probably the fix
> you're referring to: https://review.openstack.org/#/c/521170/1
>
> The traceback in that bug included the SQL statement that was failing.
> I'm not sure if that isn't included in your traceback because SQLAlchemy
> didn't report it, or if it's because that traceback is actually from the
> parent stack. If you have a traceback from the original failure in the
> child stack that would be useful. If there's a way to turn on more
> detailed reporting of errors in SQLAlchemy that would also be useful.
>
> Since this is a delete, it's possible that we need a retry-on-deadlock
> on the resource_delete() function also (though resource_update() is
> actually used more during a delete to update the status).
>
> - ZB
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181217/c9454251/attachment.html>


More information about the openstack-discuss mailing list