[openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE
jaypipes at gmail.com
Mon May 19 17:30:14 UTC 2014
On Friday in Atlanta, I had the pleasure of moderating the database
session at the Ops Meetup track. We had lots of good discussions and
heard important feedback from operators on DB topics.
For the record, I would not bring this point up so publicly unless I
believed it was a serious problem affecting a large segment of users.
When doing an informal survey of the users/operators in the room at the
start of the session, out of approximately 200 people in the room, only
a single person was using PostgreSQL, about a dozen were using standard
MySQL master/slave replication, and the rest were using MySQL Galera
clustering. So, this is a real issue for a large segment of the
operators -- or at least the ones at the session. :)
Peter Boros, from Percona, was able to provide some insight on MySQL
Galera topics, and one issue came up that is likely the cause of a lot
of heartache for operators who use MySQL Galera (or Percona XtraDB Cluster).
We were discussing whether people had seen deadlock issues  when
using MySQL Galera in their deployment, and were brainstorming on why
deadlocks might be seen. I had suggested that perhaps Nova's use of
autoincrementing primary keys may have been the cause. Peter pretty
quickly dispatched that notion, saying that Galera automatically handles
autoincrementing keys using managed innodb_autoincrement_increment and
innodb_autoincrement_offset config options.
I think at that point I mentioned that there were a number of places
that were using the SELECT ... FOR UPDATE construct in Nova (in
SQLAlchemy, it's the with_lockmode('update') modification of the query
object). Peter promptly said that was a problem. MySQL Galera does not
support SELECT ... FOR UPDATE, since it has no concept of cross-node
locking of records and results are non-deterministic.
So... what to do?
For starters, some information on the use of with_lockmode() in Nova and
Within Nova, there are actually only a few places where
with_lockmode('update') is used. Unfortunately, the use of
with_lockmode('update') is in the quota code, which tends to wrap
largish blocks of code within the Nova compute execution code.
Within Neutron, however, the use of with_lockmode('update') is all over
the place. There are 44 separate uses of it in 11 different files.
We have a number of options:
1) Stop using MySQL Galera for databases of projects that contain
2) Put a big old warning in the docs somewhere about the problem of
potential deadlocks or odd behaviour with Galera in these projects
3) For Nova and Neutron, remove the use of with_lockmode('update') and
instead use a coarse-grained file lock or a distributed lock manager for
those areas where we need deterministic reads or quiescence.
4) For the Nova db quota driver, refactor the driver to either use a
non-locking method for reservation and quota queries or move the driver
out into its own projects (or use something like Climate and make sure
that Climate uses a non-blocking algorithm for those queries...)
More information about the OpenStack-dev