queens heat db deadlock
Jay Pipes
jaypipes at gmail.com
Thu Dec 20 13:07:34 UTC 2018
On 12/20/2018 02:01 AM, Zane Bitter wrote:
> On 19/12/18 6:49 AM, Jay Pipes wrote:
>> On 12/18/2018 11:06 AM, Mike Bayer wrote:
>>> On Tue, Dec 18, 2018 at 12:36 AM Ignazio Cassano
>>> <ignaziocassano at gmail.com> wrote:
>>>>
>>>> Yes, I tried on yesterday and this workaround solved.
>>>> Thanks
>>>> Ignazio
>>>
>>> OK, so that means this "deadlock" is not really a deadlock but it is a
>>> write-conflict between two Galera masters. I have a long term
>>> goal to being relaxing this common requirement that Openstack apps
>>> only refer to one Galera master at a time. If this is a particular
>>> hotspot for Heat (no pun intended) can we pursue adding a transaction
>>> retry decorator for this operation? This is the standard approach for
>>> other applications that are subject to galera multi-master writeset
>>> conflicts such as Neutron.
>
> The weird thing about this issue is that we actually have a retry
> decorator on the operation that I assume is the problem. It was added in
> Queens and largely fixed this issue in the gate:
>
> https://review.openstack.org/#/c/521170/1/heat/db/sqlalchemy/api.py
>
>> Correct.
>>
>> Heat doesn't use SELECT .. FOR UPDATE does it? That's also a big cause
>> of the aforementioned "deadlocks".
>
> AFAIK, no. In fact we were quite careful to design stuff that is
> expected to be subject to write contention to use UPDATE ... WHERE (by
> doing query().filter_by().update() in sqlalchemy), but it turned out to
> be those very statements that were most prone to causing deadlocks in
> the gate (i.e. we added retry decorators in those two places and the
> failures went away), according to me in the commit message for that
> patch: https://review.openstack.org/521170
>
> Are we Doing It Wrong(TM)?
No, it looks to me like you're doing things correctly. The OP mentioned
that this only happens when deleting a Magnum cluster -- and that it
doesn't occur in normal Heat template usage.
I wonder (as I really don't know anything about Magnum, unfortunately),
is there something different about the Magnum cluster resource handling
in Heat that might be causing the wonkiness?
Best,
-jay
More information about the openstack-discuss
mailing list