[openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE

Jay Pipes jaypipes at gmail.com
Mon May 19 17:30:14 UTC 2014


Stackers,

On Friday in Atlanta, I had the pleasure of moderating the database 
session at the Ops Meetup track. We had lots of good discussions and 
heard important feedback from operators on DB topics.

For the record, I would not bring this point up so publicly unless I 
believed it was a serious problem affecting a large segment of users. 
When doing an informal survey of the users/operators in the room at the 
start of the session, out of approximately 200 people in the room, only 
a single person was using PostgreSQL, about a dozen were using standard 
MySQL master/slave replication, and the rest were using MySQL Galera 
clustering. So, this is a real issue for a large segment of the 
operators -- or at least the ones at the session. :)

Peter Boros, from Percona, was able to provide some insight on MySQL 
Galera topics, and one issue came up that is likely the cause of a lot 
of heartache for operators who use MySQL Galera (or Percona XtraDB Cluster).

We were discussing whether people had seen deadlock issues [1] when 
using MySQL Galera in their deployment, and were brainstorming on why 
deadlocks might be seen. I had suggested that perhaps Nova's use of 
autoincrementing primary keys may have been the cause. Peter pretty 
quickly dispatched that notion, saying that Galera automatically handles 
autoincrementing keys using managed innodb_autoincrement_increment and 
innodb_autoincrement_offset config options.

I think at that point I mentioned that there were a number of places 
that were using the SELECT ... FOR UPDATE construct in Nova (in 
SQLAlchemy, it's the with_lockmode('update') modification of the query 
object). Peter promptly said that was a problem. MySQL Galera does not 
support SELECT ... FOR UPDATE, since it has no concept of cross-node 
locking of records and results are non-deterministic.

So... what to do?

For starters, some information on the use of with_lockmode() in Nova and 
Neutron...

Within Nova, there are actually only a few places where 
with_lockmode('update') is used. Unfortunately, the use of 
with_lockmode('update') is in the quota code, which tends to wrap 
largish blocks of code within the Nova compute execution code.

Within Neutron, however, the use of with_lockmode('update') is all over 
the place. There are 44 separate uses of it in 11 different files.

We have a number of options:

1) Stop using MySQL Galera for databases of projects that contain 
with_lockmode('update')

2) Put a big old warning in the docs somewhere about the problem of 
potential deadlocks or odd behaviour with Galera in these projects

3) For Nova and Neutron, remove the use of with_lockmode('update') and 
instead use a coarse-grained file lock or a distributed lock manager for 
those areas where we need deterministic reads or quiescence.

4) For the Nova db quota driver, refactor the driver to either use a 
non-locking method for reservation and quota queries or move the driver 
out into its own projects (or use something like Climate and make sure 
that Climate uses a non-blocking algorithm for those queries...)

Thoughts?

-jay

[1] http://lists.openstack.org/pipermail/openstack/2014-May/007202.html



More information about the OpenStack-dev mailing list