queens heat db deadlock

Joe Topjian joe at topjian.net
Mon Dec 17 19:37:53 UTC 2018


Hi Ignazio,

Do you currently have HAProxy configured to route requests to multiple
MariaDB nodes? If so, as a workaround, try doing an active/backup
configuration where all but 1 node is configured as an HAProxy "backup".

Thanks,
Joe



On Sun, Dec 16, 2018 at 10:46 PM Ignazio Cassano <ignaziocassano at gmail.com>
wrote:

> Hello Zane, it happens also when I create a magnum cluster .
> My mariadb cluster is behind haproxy.
> Do you need any other info?
> Thanks
> Ignazio
>
>
> Il giorno Lun 17 Dic 2018 00:28 Zane Bitter <zbitter at redhat.com> ha
> scritto:
>
>> On 14/12/18 4:06 AM, Ignazio Cassano wrote:
>> > Hi All,
>> > I installed queens on centos 7.
>> > Heat seems to work fine with templates I wrote but when I create magnum
>> > cluster
>> > I often face with db deadlock in heat-engine log:
>>
>> The stacktrace below is in stack delete, so do you mean that the problem
>> occurs when deleting a Magnum cluster, or can it occur any time with a
>> magnum cluster?
>>
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback
>> (most
>> > recent call last):
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 918,
>> in
>> > _action_recorder
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2035,
>> > in delete
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>> *action_args)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> > "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346,
>> > in wrapper
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step =
>> > next(subtask)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 977,
>> in
>> > action_handler_task
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done =
>> > check(handler_data)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> >
>> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
>> > line 587, in check_delete_complete
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return
>> > self._check_status_complete(self.DELETE)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
>> >
>> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
>> > line 454, in _check_status_complete
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>> action=action)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>> > ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213,
>> > u'Deadlock found when trying to get lock; try restarting transaction')
>> > (Background on this error at: http://sqlalche.me/e/2j85)
>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>> > 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack
>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
>> > Stack DELETE FAILED
>> > (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw):
>> > Resource DELETE failed: resources[0]: (pymysql.err.InternalError)
>> (1213,
>> > u'Deadlock found when trying to get lock; try restarting transaction')
>> > (Background on this error at: http://sqlalche.me/e/2j85)
>> > 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource
>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
>> > DELETE: ResourceGroup "swarm_primary_master"
>> > [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack
>> > "swarm-clustergp27-ebhsalhb4bop" [b43256fa-52e2-4613-ac15-63fd9340a8be]
>> >
>> >
>> > I read this issue was solved in heat version 10 but seems not.
>>
>> The patch for bug 1732969, which is in Queens, is probably the fix
>> you're referring to: https://review.openstack.org/#/c/521170/1
>>
>> The traceback in that bug included the SQL statement that was failing.
>> I'm not sure if that isn't included in your traceback because SQLAlchemy
>> didn't report it, or if it's because that traceback is actually from the
>> parent stack. If you have a traceback from the original failure in the
>> child stack that would be useful. If there's a way to turn on more
>> detailed reporting of errors in SQLAlchemy that would also be useful.
>>
>> Since this is a delete, it's possible that we need a retry-on-deadlock
>> on the resource_delete() function also (though resource_update() is
>> actually used more during a delete to update the status).
>>
>> - ZB
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181217/68d65346/attachment.html>


More information about the openstack-discuss mailing list