queens heat db deadlock

Ignazio Cassano ignaziocassano at gmail.com
Tue Dec 18 05:28:51 UTC 2018


Yes, I  tried on yesterday and this workaround solved.
Thanks
Ignazio

Il giorno Lun 17 Dic 2018 20:38 Joe Topjian <joe at topjian.net> ha scritto:

> Hi Ignazio,
>
> Do you currently have HAProxy configured to route requests to multiple
> MariaDB nodes? If so, as a workaround, try doing an active/backup
> configuration where all but 1 node is configured as an HAProxy "backup".
>
> Thanks,
> Joe
>
>
>
> On Sun, Dec 16, 2018 at 10:46 PM Ignazio Cassano <ignaziocassano at gmail.com>
> wrote:
>
>> Hello Zane, it happens also when I create a magnum cluster .
>> My mariadb cluster is behind haproxy.
>> Do you need any other info?
>> Thanks
>> Ignazio
>>
>>
>> Il giorno Lun 17 Dic 2018 00:28 Zane Bitter <zbitter at redhat.com> ha
>> scritto:
>>
>>> On 14/12/18 4:06 AM, Ignazio Cassano wrote:
>>> > Hi All,
>>> > I installed queens on centos 7.
>>> > Heat seems to work fine with templates I wrote but when I create
>>> magnum
>>> > cluster
>>> > I often face with db deadlock in heat-engine log:
>>>
>>> The stacktrace below is in stack delete, so do you mean that the problem
>>> occurs when deleting a Magnum cluster, or can it occur any time with a
>>> magnum cluster?
>>>
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback
>>> (most
>>> > recent call last):
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 918,
>>> in
>>> > _action_recorder
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2035,
>>> > in delete
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>>> *action_args)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> > "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346,
>>> > in wrapper
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step =
>>> > next(subtask)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 977,
>>> in
>>> > action_handler_task
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done =
>>> > check(handler_data)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> >
>>> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
>>> > line 587, in check_delete_complete
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return
>>> > self._check_status_complete(self.DELETE)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>>> >
>>> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
>>> > line 454, in _check_status_complete
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>>> action=action)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>>> > ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213,
>>> > u'Deadlock found when trying to get lock; try restarting transaction')
>>> > (Background on this error at: http://sqlalche.me/e/2j85)
>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>>> > 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack
>>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
>>> > Stack DELETE FAILED
>>> > (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw):
>>> > Resource DELETE failed: resources[0]: (pymysql.err.InternalError)
>>> (1213,
>>> > u'Deadlock found when trying to get lock; try restarting transaction')
>>> > (Background on this error at: http://sqlalche.me/e/2j85)
>>> > 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource
>>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
>>> > DELETE: ResourceGroup "swarm_primary_master"
>>> > [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack
>>> > "swarm-clustergp27-ebhsalhb4bop" [b43256fa-52e2-4613-ac15-63fd9340a8be]
>>> >
>>> >
>>> > I read this issue was solved in heat version 10 but seems not.
>>>
>>> The patch for bug 1732969, which is in Queens, is probably the fix
>>> you're referring to: https://review.openstack.org/#/c/521170/1
>>>
>>> The traceback in that bug included the SQL statement that was failing.
>>> I'm not sure if that isn't included in your traceback because SQLAlchemy
>>> didn't report it, or if it's because that traceback is actually from the
>>> parent stack. If you have a traceback from the original failure in the
>>> child stack that would be useful. If there's a way to turn on more
>>> detailed reporting of errors in SQLAlchemy that would also be useful.
>>>
>>> Since this is a delete, it's possible that we need a retry-on-deadlock
>>> on the resource_delete() function also (though resource_update() is
>>> actually used more during a delete to update the status).
>>>
>>> - ZB
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181218/4216349e/attachment.html>


More information about the openstack-discuss mailing list