[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Attila Fazekas afazekas at redhat.com
Thu Feb 5 12:54:04 UTC 2015





----- Original Message -----
> From: "Matthew Booth" <mbooth at redhat.com>
> To: openstack-dev at lists.openstack.org
> Sent: Thursday, February 5, 2015 12:32:33 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
> 
> On 05/02/15 11:01, Attila Fazekas wrote:
> > I have a question related to deadlock handling as well.
> > 
> > Why the DBDeadlock exception is not caught generally for all api/rpc
> > request ?
> > 
> > The mysql recommendation regarding to Deadlocks [1]:
> > "Normally, you must write your applications so that they are always
> >  prepared to re-issue a transaction if it gets rolled back because of a
> >  deadlock."
> 
> This is evil imho, although it may well be pragmatic. A deadlock (a real
> deadlock, that is) occurs because of a preventable bug in code. It
> occurs because 2 transactions have attempted to take multiple locks in a
> different order. Getting this right is hard, but it is achievable. The
> solution to real deadlocks is to fix the bugs.
>
> 
> Galera 'deadlocks' on the other hand are not deadlocks, despite being
> reported as such (sounds as though this is due to an implementation
> quirk?). They don't involve 2 transactions holding mutual locks, and
> there is never any doubt about how to proceed. They involve 2
> transactions holding the same lock, and 1 of them committed first. In a
> real deadlock they wouldn't get as far as commit. This isn't any kind of
> bug: it's normal behaviour in this environment and you just have to
> handle it.
>
> > Now the services are just handling the DBDeadlock in several places.
> > We have some logstash hits for other places even without galera.
> 
> I haven't had much success with logstash. Could you post a query which
> would return these? This would be extremely interesting.

Just use this:
message: "DBDeadlock"

If you would like to exclude the lock wait timeout ones:
message: "Deadlock found when trying to get lock"


> > Instead of throwing 503 to the end user, the request could be repeated
> > `silently`.
> > 
> > The users would be able repeat the request himself,
> > so the automated repeat should not cause unexpected new problem.
> 
> Good point: we could argue 'no worse than now', even if it's buggy.
> 
> > The retry limit might be configurable, the exception needs to be watched
> > before
> > anything sent to the db on behalf of the transaction or request.
> > 
> > Considering all request handler as potential deadlock thrower seams much
> > easier than,
> > deciding case by case.
> 
> Well this happens at the transaction level, and we don't quite have a
> 1:1 request:transaction relationship. We're moving towards it, but
> potentially long running requests will always have to use multiple
> transactions.
> 
> However, I take your point. I think retry on transaction failure is
> something which would benefit from standard handling in a library.
> 
> Matt
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
> 
> Phone: +442070094448 (UK)
> GPG ID:  D33C3490
> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list