queens heat db deadlock

Ignazio Cassano ignaziocassano at gmail.com
Tue Dec 18 18:23:20 UTC 2018


Hello,  seems deadlock happens when heat craates network objects.
I am not sure, but if I remember some db deadlock also in cinder.
Any case with the workaround heat never fails for db deadlock and Now stack
deleting do not stop.
Ignazio

Il giorno Mar 18 Dic 2018 17:06 Mike Bayer <mike_mp at zzzcomputing.com> ha
scritto:

> On Tue, Dec 18, 2018 at 12:36 AM Ignazio Cassano
> <ignaziocassano at gmail.com> wrote:
> >
> > Yes, I  tried on yesterday and this workaround solved.
> > Thanks
> > Ignazio
>
> OK, so that means this "deadlock" is not really a deadlock but it is a
> write-conflict between two Galera masters.      I have a long term
> goal to being relaxing this common requirement that Openstack apps
> only refer to one Galera master at a time.    If this is a particular
> hotspot for Heat (no pun intended) can we pursue adding a transaction
> retry decorator for this operation?  This is the standard approach for
> other applications that are subject to galera multi-master writeset
> conflicts such as Neutron.
>
>
>
>
> >
> > Il giorno Lun 17 Dic 2018 20:38 Joe Topjian <joe at topjian.net> ha
> scritto:
> >>
> >> Hi Ignazio,
> >>
> >> Do you currently have HAProxy configured to route requests to multiple
> MariaDB nodes? If so, as a workaround, try doing an active/backup
> configuration where all but 1 node is configured as an HAProxy "backup".
> >>
> >> Thanks,
> >> Joe
> >>
> >>
> >>
> >> On Sun, Dec 16, 2018 at 10:46 PM Ignazio Cassano <
> ignaziocassano at gmail.com> wrote:
> >>>
> >>> Hello Zane, it happens also when I create a magnum cluster .
> >>> My mariadb cluster is behind haproxy.
> >>> Do you need any other info?
> >>> Thanks
> >>> Ignazio
> >>>
> >>>
> >>> Il giorno Lun 17 Dic 2018 00:28 Zane Bitter <zbitter at redhat.com> ha
> scritto:
> >>>>
> >>>> On 14/12/18 4:06 AM, Ignazio Cassano wrote:
> >>>> > Hi All,
> >>>> > I installed queens on centos 7.
> >>>> > Heat seems to work fine with templates I wrote but when I create
> magnum
> >>>> > cluster
> >>>> > I often face with db deadlock in heat-engine log:
> >>>>
> >>>> The stacktrace below is in stack delete, so do you mean that the
> problem
> >>>> occurs when deleting a Magnum cluster, or can it occur any time with a
> >>>> magnum cluster?
> >>>>
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource Traceback
> (most
> >>>> > recent call last):
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line
> 918, in
> >>>> > _action_recorder
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     yield
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line
> 2035,
> >>>> > in delete
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>  *action_args)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> > "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line
> 346,
> >>>> > in wrapper
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     step =
> >>>> > next(subtask)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> > "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line
> 977, in
> >>>> > action_handler_task
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     done =
> >>>> > check(handler_data)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> >
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
> >>>> > line 587, in check_delete_complete
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource     return
> >>>> > self._check_status_complete(self.DELETE)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource   File
> >>>> >
> "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py",
> >>>> > line 454, in _check_status_complete
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
>  action=action)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> >>>> > ResourceFailure: resources[0]: (pymysql.err.InternalError) (1213,
> >>>> > u'Deadlock found when trying to get lock; try restarting
> transaction')
> >>>> > (Background on this error at: http://sqlalche.me/e/2j85)
> >>>> > 2018-12-13 11:48:46.016 89597 ERROR heat.engine.resource
> >>>> > 2018-12-13 11:48:46.030 89597 INFO heat.engine.stack
> >>>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
> >>>> > Stack DELETE FAILED
> >>>> > (swarm-clustergp27-ebhsalhb4bop-swarm_primary_master-tzgkh3ncmymw):
> >>>> > Resource DELETE failed: resources[0]: (pymysql.err.InternalError)
> (1213,
> >>>> > u'Deadlock found when trying to get lock; try restarting
> transaction')
> >>>> > (Background on this error at: http://sqlalche.me/e/2j85)
> >>>> > 2018-12-13 11:48:46.844 89595 INFO heat.engine.resource
> >>>> > [req-9a43184f-77d8-4fad-ab9e-2b9826c10b70 - admin - default default]
> >>>> > DELETE: ResourceGroup "swarm_primary_master"
> >>>> > [6984db3c-3ac1-4afc-901f-21b3e7f230a7] Stack
> >>>> > "swarm-clustergp27-ebhsalhb4bop"
> [b43256fa-52e2-4613-ac15-63fd9340a8be]
> >>>> >
> >>>> >
> >>>> > I read this issue was solved in heat version 10 but seems not.
> >>>>
> >>>> The patch for bug 1732969, which is in Queens, is probably the fix
> >>>> you're referring to: https://review.openstack.org/#/c/521170/1
> >>>>
> >>>> The traceback in that bug included the SQL statement that was failing.
> >>>> I'm not sure if that isn't included in your traceback because
> SQLAlchemy
> >>>> didn't report it, or if it's because that traceback is actually from
> the
> >>>> parent stack. If you have a traceback from the original failure in the
> >>>> child stack that would be useful. If there's a way to turn on more
> >>>> detailed reporting of errors in SQLAlchemy that would also be useful.
> >>>>
> >>>> Since this is a delete, it's possible that we need a retry-on-deadlock
> >>>> on the resource_delete() function also (though resource_update() is
> >>>> actually used more during a delete to update the status).
> >>>>
> >>>> - ZB
> >>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181218/5e64d33d/attachment.html>


More information about the openstack-discuss mailing list