[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Mike Bayer mbayer at redhat.com
Thu Feb 5 16:35:44 UTC 2015



Attila Fazekas <afazekas at redhat.com> wrote:

> I have a question related to deadlock handling as well.
> 
> Why the DBDeadlock exception is not caught generally for all api/rpc request ?
> 
> The mysql recommendation regarding to Deadlocks [1]:
> "Normally, you must write your applications so that they are always 
> prepared to re-issue a transaction if it gets rolled back because of a deadlock."
> 
> Now the services are just handling the DBDeadlock in several places.
> We have some logstash hits for other places even without galera.
> 
> Instead of throwing 503 to the end user, the request could be repeated `silently`.
> 
> The users would be able repeat the request himself,
> so the automated repeat should not cause unexpected new problem.
> 
> The retry limit might be configurable, the exception needs to be watched before
> anything sent to the db on behalf of the transaction or request.
> 
> Considering all request handler as potential deadlock thrower seams much easier than,
> deciding case by case.  

typically, deadlocks in “normal” applications are very unusual, except in
well-known “hot-spots” where they are known to occur. The deadlock-retry can
be applied to all methods as a whole, but this generally adds a lot more
weight to the app, in that methods need to be written with the assumption
that this is to occur. It complicates the potential that perhaps one method
that is already wrapped in a retry needs to call upon another method that is
also wrapped - should the wrappers organize themselves into a single “wrap”
for the whole thing?   It’s not like this is a bad idea, but it does have potential
implications.

Part of the promise of enginefacade [1] is that, if applications used the
decorator version (which unfortunately not all apps week to want to), we
could build this “smart retry” functionality right into the decorator and we 
would in fact gain the ability to do this pretty easily.

[1] https://review.openstack.org/#/c/125181/




> [1] http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html
> 
> ----- Original Message -----
>> From: "Matthew Booth" <mbooth at redhat.com>
>> To: openstack-dev at lists.openstack.org
>> Sent: Thursday, February 5, 2015 10:36:55 AM
>> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
>> 
>> On 04/02/15 17:05, Sahid Orentino Ferdjaoui wrote:
>>>> * Commit will fail if there is a replication conflict
>>>> 
>>>> foo is a table with a single field, which is its primary key.
>>>> 
>>>> A: start transaction;
>>>> B: start transaction;
>>>> A: insert into foo values(1);
>>>> B: insert into foo values(1); <-- 'regular' DB would block here, and
>>>>                                  report an error on A's commit
>>>> A: commit; <-- success
>>>> B: commit; <-- KABOOM
>>>> 
>>>> Confusingly, Galera will report a 'deadlock' to node B, despite this not
>>>> being a deadlock by any definition I'm familiar with.
>>> 
>>> Yes ! and if I can add more information and I hope I do not make
>>> mistake I think it's a know issue which comes from MySQL, that is why
>>> we have a decorator to do a retry and so handle this case here:
>>> 
>>>  http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177
>> 
>> Right, and that remains a significant source of confusion and
>> obfuscation in the db api. Our db code is littered with races and
>> potential actual deadlocks, but only some functions are decorated. Are
>> they decorated because of real deadlocks, or because of Galera lock
>> contention? The solutions to those 2 problems are very different! Also,
>> hunting deadlocks is hard enough work. Adding the possibility that they
>> might not even be there is just evil.
>> 
>> Incidentally, we're currently looking to replace this stuff with some
>> new code in oslo.db, which is why I'm looking at it.
>> 
>> Matt
>> --
>> Matthew Booth
>> Red Hat Engineering, Virtualisation Team
>> 
>> Phone: +442070094448 (UK)
>> GPG ID:  D33C3490
>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list