Open Stack

Thu Feb 5 11:32:33 UTC 2015

On 05/02/15 11:01, Attila Fazekas wrote:
> I have a question related to deadlock handling as well.
> 
> Why the DBDeadlock exception is not caught generally for all api/rpc request ?
> 
> The mysql recommendation regarding to Deadlocks [1]:
> "Normally, you must write your applications so that they are always 
>  prepared to re-issue a transaction if it gets rolled back because of a deadlock."

This is evil imho, although it may well be pragmatic. A deadlock (a real
deadlock, that is) occurs because of a preventable bug in code. It
occurs because 2 transactions have attempted to take multiple locks in a
different order. Getting this right is hard, but it is achievable. The
solution to real deadlocks is to fix the bugs.

Galera 'deadlocks' on the other hand are not deadlocks, despite being
reported as such (sounds as though this is due to an implementation
quirk?). They don't involve 2 transactions holding mutual locks, and
there is never any doubt about how to proceed. They involve 2
transactions holding the same lock, and 1 of them committed first. In a
real deadlock they wouldn't get as far as commit. This isn't any kind of
bug: it's normal behaviour in this environment and you just have to
handle it.

> Now the services are just handling the DBDeadlock in several places.
> We have some logstash hits for other places even without galera.

I haven't had much success with logstash. Could you post a query which
would return these? This would be extremely interesting.

> Instead of throwing 503 to the end user, the request could be repeated `silently`.
> 
> The users would be able repeat the request himself,
> so the automated repeat should not cause unexpected new problem.

Good point: we could argue 'no worse than now', even if it's buggy.

> The retry limit might be configurable, the exception needs to be watched before
> anything sent to the db on behalf of the transaction or request.
> 
> Considering all request handler as potential deadlock thrower seams much easier than,
> deciding case by case.  

Well this happens at the transaction level, and we don't quite have a
1:1 request:transaction relationship. We're moving towards it, but
potentially long running requests will always have to use multiple
transactions.

However, I take your point. I think retry on transaction failure is
something which would benefit from standard handling in a library.

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

Open Stack

[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

OpenStack

Community

Documentation

Branding & Legal