[openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE

Salvatore Orlando sorlando at nicira.com
Mon May 19 22:26:10 UTC 2014


Some comments inline.

Salvatore


On 19 May 2014 20:32, sridhar basam <sridhar.basam at gmail.com> wrote:

>
>
>
> On Mon, May 19, 2014 at 1:30 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>
>> Stackers,
>>
>> On Friday in Atlanta, I had the pleasure of moderating the database
>> session at the Ops Meetup track. We had lots of good discussions and heard
>> important feedback from operators on DB topics.
>>
>> For the record, I would not bring this point up so publicly unless I
>> believed it was a serious problem affecting a large segment of users. When
>> doing an informal survey of the users/operators in the room at the start of
>> the session, out of approximately 200 people in the room, only a single
>> person was using PostgreSQL, about a dozen were using standard MySQL
>> master/slave replication, and the rest were using MySQL Galera clustering.
>> So, this is a real issue for a large segment of the operators -- or at
>> least the ones at the session. :)
>>
>>
> ​We are one of those operators that use Galera for replicating our mysql
> databases. We used to  see issues with deadlocks when having multiple mysql
> writers in our mysql cluster. As a workaround we have our haproxy
> configuration in an active-standby configuration for our mysql VIP.
>
> I seem to recall we had a lot of the deadlocks happen through Neutron.
> When we go through our Icehouse testing, we will redo our multimaster mysql
> setup and provide feedback on the issues we see.
>

The SELECT... FOR UPDATE issue is going to be a non trivial one for neutron
as well. Some components, like IPAM, heavily rely on it.
However, Neutron is a lot more susceptible to deadlock problems than nova
because it does not implement at the moment a retry mechanism.
This is something which should be added during the Juno release cycle
regardless of all the other enhancement currently being planned, such as
task oriented operations.

>
> thanks,
>          Sridhar
>
>
>
>> Peter Boros, from Percona, was able to provide some insight on MySQL
>> Galera topics, and one issue came up that is likely the cause of a lot of
>> heartache for operators who use MySQL Galera (or Percona XtraDB Cluster).
>>
>> We were discussing whether people had seen deadlock issues [1] when using
>> MySQL Galera in their deployment, and were brainstorming on why deadlocks
>> might be seen. I had suggested that perhaps Nova's use of autoincrementing
>> primary keys may have been the cause. Peter pretty quickly dispatched that
>> notion, saying that Galera automatically handles autoincrementing keys
>> using managed innodb_autoincrement_increment and
>> innodb_autoincrement_offset config options.
>>
>> I think at that point I mentioned that there were a number of places that
>> were using the SELECT ... FOR UPDATE construct in Nova (in SQLAlchemy, it's
>> the with_lockmode('update') modification of the query object). Peter
>> promptly said that was a problem. MySQL Galera does not support SELECT ...
>> FOR UPDATE, since it has no concept of cross-node locking of records and
>> results are non-deterministic.
>>
>> So... what to do?
>>
>> For starters, some information on the use of with_lockmode() in Nova and
>> Neutron...
>>
>> Within Nova, there are actually only a few places where
>> with_lockmode('update') is used. Unfortunately, the use of
>> with_lockmode('update') is in the quota code, which tends to wrap largish
>> blocks of code within the Nova compute execution code.
>>
>> Within Neutron, however, the use of with_lockmode('update') is all over
>> the place. There are 44 separate uses of it in 11 different files.
>>
>>
I will report on a separate thread on this, so that we can have an
assessment of where locking statements are used and why.


> We have a number of options:
>>
>
I thin option 0 should be to rework/redesign the code, where possible, to
avoid DB-level locking at all.


>
>> 1) Stop using MySQL Galera for databases of projects that contain
>> with_lockmode('update')
>>
>
This looks hideous, but I am afraid this is what all people wishing to
deploy Icehouse should consider doing.


>
>> 2) Put a big old warning in the docs somewhere about the problem of
>> potential deadlocks or odd behaviour with Galera in these projects
>>
>> 3) For Nova and Neutron, remove the use of with_lockmode('update') and
>> instead use a coarse-grained file lock or a distributed lock manager for
>> those areas where we need deterministic reads or quiescence.
>>
>
We had an attempt at implementing a sort of distributed lock for neutron:
https://review.openstack.org/#/c/34695/
Beyond the implementation reservations on this patch, one thing that should
be noticed, probably needless to say, is that distributed coordination is
something that should never be taken in a light-hearted way.
Once all the non-locking solution have been ruled out, distributed
coordination among processes could be considered. In that case I think it
might be better to use some OTS software rather than working out some home
grown solution (I surely do not see space for a new project here)
On a side note, I'm rather ignorant on python frameworks for distributed
coordination... concoord? Is zookeper something that should be ruled out
because of language restrictions?



>
>> 4) For the Nova db quota driver, refactor the driver to either use a
>> non-locking method for reservation and quota queries or move the driver out
>> into its own projects (or use something like Climate and make sure that
>> Climate uses a non-blocking algorithm for those queries...)
>>
>> Thoughts?
>>
>> -jay
>>
>> [1] http://lists.openstack.org/pipermail/openstack/2014-May/007202.html
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140520/f63dda12/attachment.html>


More information about the OpenStack-dev mailing list