[openstack-dev] [nova][neutron][mysql] IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE

Rossella Sblendido rsblendido at suse.com
Tue May 20 15:13:44 UTC 2014


Please see inline.

cheers,

Rossella

On 05/20/2014 12:26 AM, Salvatore Orlando wrote:
> Some comments inline.
>
> Salvatore
>
>
> On 19 May 2014 20:32, sridhar basam <sridhar.basam at gmail.com
> <mailto:sridhar.basam at gmail.com>> wrote:
>
>
>
>
>     On Mon, May 19, 2014 at 1:30 PM, Jay Pipes <jaypipes at gmail.com
>     <mailto:jaypipes at gmail.com>> wrote:
>
>         Stackers,
>
>         On Friday in Atlanta, I had the pleasure of moderating the
>         database session at the Ops Meetup track. We had lots of good
>         discussions and heard important feedback from operators on DB
>         topics.
>
>         For the record, I would not bring this point up so publicly
>         unless I believed it was a serious problem affecting a large
>         segment of users. When doing an informal survey of the
>         users/operators in the room at the start of the session, out
>         of approximately 200 people in the room, only a single person
>         was using PostgreSQL, about a dozen were using standard MySQL
>         master/slave replication, and the rest were using MySQL Galera
>         clustering. So, this is a real issue for a large segment of
>         the operators -- or at least the ones at the session. :)
>
>
>     We are one of those operators that use Galera for replicating our
>     mysql databases. We used to  see issues with deadlocks when having
>     multiple mysql writers in our mysql cluster. As a workaround we
>     have our haproxy configuration in an active-standby configuration
>     for our mysql VIP. 
>
>     I seem to recall we had a lot of the deadlocks happen through
>     Neutron. When we go through our Icehouse testing, we will redo our
>     multimaster mysql setup and provide feedback on the issues we see.
>
>
> The SELECT... FOR UPDATE issue is going to be a non trivial one for
> neutron as well. Some components, like IPAM, heavily rely on it.
> However, Neutron is a lot more susceptible to deadlock problems than
> nova because it does not implement at the moment a retry mechanism.
> This is something which should be added during the Juno release cycle
> regardless of all the other enhancement currently being planned, such
> as task oriented operations. 
>
>
>     thanks,
>              Sridhar
>
>      
>
>         Peter Boros, from Percona, was able to provide some insight on
>         MySQL Galera topics, and one issue came up that is likely the
>         cause of a lot of heartache for operators who use MySQL Galera
>         (or Percona XtraDB Cluster).
>
>         We were discussing whether people had seen deadlock issues [1]
>         when using MySQL Galera in their deployment, and were
>         brainstorming on why deadlocks might be seen. I had suggested
>         that perhaps Nova's use of autoincrementing primary keys may
>         have been the cause. Peter pretty quickly dispatched that
>         notion, saying that Galera automatically handles
>         autoincrementing keys using managed
>         innodb_autoincrement_increment and innodb_autoincrement_offset
>         config options.
>
>         I think at that point I mentioned that there were a number of
>         places that were using the SELECT ... FOR UPDATE construct in
>         Nova (in SQLAlchemy, it's the with_lockmode('update')
>         modification of the query object). Peter promptly said that
>         was a problem. MySQL Galera does not support SELECT ... FOR
>         UPDATE, since it has no concept of cross-node locking of
>         records and results are non-deterministic.
>
>         So... what to do?
>
>         For starters, some information on the use of with_lockmode()
>         in Nova and Neutron...
>
>         Within Nova, there are actually only a few places where
>         with_lockmode('update') is used. Unfortunately, the use of
>         with_lockmode('update') is in the quota code, which tends to
>         wrap largish blocks of code within the Nova compute execution
>         code.
>
>         Within Neutron, however, the use of with_lockmode('update') is
>         all over the place. There are 44 separate uses of it in 11
>         different files.
>
>
> I will report on a separate thread on this, so that we can have an
> assessment of where locking statements are used and why.
>  
>
>         We have a number of options:
>
>
> I thin option 0 should be to rework/redesign the code, where possible,
> to avoid DB-level locking at all.

I totally agree. Is anybody already coordinating this rework? I'd like
to help. After redesigning, it is gonna be easier to make a decision
regarding a distributed lock manager.

>  
>
>
>         1) Stop using MySQL Galera for databases of projects that
>         contain with_lockmode('update')
>
>
> This looks hideous, but I am afraid this is what all people wishing to
> deploy Icehouse should consider doing.
>  
>
>
>         2) Put a big old warning in the docs somewhere about the
>         problem of potential deadlocks or odd behaviour with Galera in
>         these projects
>
>         3) For Nova and Neutron, remove the use of
>         with_lockmode('update') and instead use a coarse-grained file
>         lock or a distributed lock manager for those areas where we
>         need deterministic reads or quiescence.
>
>
> We had an attempt at implementing a sort of distributed lock for
> neutron: https://review.openstack.org/#/c/34695/
> Beyond the implementation reservations on this patch, one thing that
> should be noticed, probably needless to say, is that distributed
> coordination is something that should never be taken in a
> light-hearted way.
> Once all the non-locking solution have been ruled out, distributed
> coordination among processes could be considered. In that case I think
> it might be better to use some OTS software rather than working out
> some home grown solution (I surely do not see space for a new project
> here)
> On a side note, I'm rather ignorant on python frameworks for
> distributed coordination... concoord? Is zookeper something that
> should be ruled out because of language restrictions?
>
>  
>
>
>         4) For the Nova db quota driver, refactor the driver to either
>         use a non-locking method for reservation and quota queries or
>         move the driver out into its own projects (or use something
>         like Climate and make sure that Climate uses a non-blocking
>         algorithm for those queries...)
>
>         Thoughts?
>
>         -jay
>
>         [1]
>         http://lists.openstack.org/pipermail/openstack/2014-May/007202.html
>
>         _______________________________________________
>         OpenStack-dev mailing list
>         OpenStack-dev at lists.openstack.org
>         <mailto:OpenStack-dev at lists.openstack.org>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140520/06b7e52e/attachment.html>


More information about the OpenStack-dev mailing list