queens heat db deadlock

Ignazio Cassano ignaziocassano at gmail.com
Wed Jan 2 11:27:13 UTC 2019


Hello Zane, we applyed the patch and modified our haproxy : unfortunately
it does not solve db deadlock issue.
Ignazio & Gianpiero


Il giorno mer 2 gen 2019 alle ore 07:28 Zane Bitter <zbitter at redhat.com> ha
scritto:

> On 21/12/18 2:07 AM, Jay Pipes wrote:
> > On 12/20/2018 02:01 AM, Zane Bitter wrote:
> >> On 19/12/18 6:49 AM, Jay Pipes wrote:
> >>> On 12/18/2018 11:06 AM, Mike Bayer wrote:
> >>>> On Tue, Dec 18, 2018 at 12:36 AM Ignazio Cassano
> >>>> <ignaziocassano at gmail.com> wrote:
> >>>>>
> >>>>> Yes, I  tried on yesterday and this workaround solved.
> >>>>> Thanks
> >>>>> Ignazio
> >>>>
> >>>> OK, so that means this "deadlock" is not really a deadlock but it is a
> >>>> write-conflict between two Galera masters.      I have a long term
> >>>> goal to being relaxing this common requirement that Openstack apps
> >>>> only refer to one Galera master at a time.    If this is a particular
> >>>> hotspot for Heat (no pun intended) can we pursue adding a transaction
> >>>> retry decorator for this operation?  This is the standard approach for
> >>>> other applications that are subject to galera multi-master writeset
> >>>> conflicts such as Neutron.
> >>
> >> The weird thing about this issue is that we actually have a retry
> >> decorator on the operation that I assume is the problem. It was added
> >> in Queens and largely fixed this issue in the gate:
> >>
> >> https://review.openstack.org/#/c/521170/1/heat/db/sqlalchemy/api.py
> >>
> >>> Correct.
> >>>
> >>> Heat doesn't use SELECT .. FOR UPDATE does it? That's also a big
> >>> cause of the aforementioned "deadlocks".
> >>
> >> AFAIK, no. In fact we were quite careful to design stuff that is
> >> expected to be subject to write contention to use UPDATE ... WHERE (by
> >> doing query().filter_by().update() in sqlalchemy), but it turned out
> >> to be those very statements that were most prone to causing deadlocks
> >> in the gate (i.e. we added retry decorators in those two places and
> >> the failures went away), according to me in the commit message for
> >> that patch: https://review.openstack.org/521170
> >>
> >> Are we Doing It Wrong(TM)?
> >
> > No, it looks to me like you're doing things correctly. The OP mentioned
> > that this only happens when deleting a Magnum cluster -- and that it
> > doesn't occur in normal Heat template usage.
> >
> > I wonder (as I really don't know anything about Magnum, unfortunately),
> > is there something different about the Magnum cluster resource handling
> > in Heat that might be causing the wonkiness?
>
> There's no special-casing for Magnum within Heat. It's likely to be just
> that there's a lot of resources in a Magnum cluster - or more
> specifically, a lot of edges in the resource graph, which leads to more
> write contention (and, in a multi-master setup, more write conflicts).
> I'd assume that any similarly-complex template would have the same
> issues, and that Ignazio just didn't have anything else that complex to
> hand.
>
> That gives me an idea, though. I wonder if this would help:
>
> https://review.openstack.org/627914
>
> Ignazio, could you possibly test with that ^ patch in multi-master mode
> to see if it resolves the issue?
>
> cheers,
> Zane.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190102/d7ff5bd8/attachment.html>


More information about the openstack-discuss mailing list