[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Attila Fazekas afazekas at redhat.com
Tue Feb 10 10:46:53 UTC 2015





----- Original Message -----
> From: "Jay Pipes" <jaypipes at gmail.com>
> To: openstack-dev at lists.openstack.org
> Sent: Monday, February 9, 2015 9:36:45 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
> 
> On 02/09/2015 03:10 PM, Clint Byrum wrote:
> > Excerpts from Jay Pipes's message of 2015-02-09 10:15:10 -0800:
> >> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>> Is the retry/swap way really solves anything here.
> >> <snip>
> >>> Am I missed something ?
> >>
> >> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >> that are needed to support SELECT FOR UPDATE statements across multiple
> >> cluster nodes.
> >>
> >> https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ
> >
> > Attila acknowledged that. What Attila was saying was that by using it
> > with Galera, the box that is doing the FOR UPDATE locks will simply fail
> > upon commit because a conflicting commit has already happened and arrived
> > from the node that accepted the write. Further what Attila is saying is
> > that this means there is not such an obvious advantage to the CAS method,
> > since the rollback and the # updated rows == 0 are effectively equivalent
> > at this point, seeing as the prior commit has already arrived and thus
> > will not need to wait to fail certification and be rolled back.
> 
> No, that is not correct. In the case of the CAS technique, the frequency
> of rollbacks due to certification failure is demonstrably less than when
> using SELECT FOR UPDATE and relying on the certification timeout error
> to signal a deadlock.
> 
> > I am not entirely certain that is true though, as I think what will
> > happen in sequential order is:
> >
> > writer1: UPDATE books SET genre = 'Scifi' WHERE genre = 'sciencefiction';
> > writer1: --> send in-progress update to cluster
> > writer2: SELECT FOR UPDATE books WHERE id=3;
> > writer1: COMMIT
> > writer1: --> try to certify commit in cluster
> > ** Here is where I stop knowing for sure what happens **
> > writer2: certifies writer1's transaction or blocks?
> 
> It will certify writer1's transaction. It will only block another thread
> hitting writer2 requesting write locks or write-intent read locks on the
> same records.
> 
> > writer2: UPDATE books SET genre = 'sciencefiction' WHERE id=3;
> > writer2: COMMIT --> One of them is rolled back.
> >

The other transaction can be rolled back before you do an actual commit:
writer1: BEGIN
writer2: BEGIN
writer1: update test set val=42 where id=1;
writer2: update test set val=42 where id=1;
writer1: COMMIT
writer2: show variables;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

As you can see 2th transaction failed without issuing a COMMIT after the 1th one committed.
You could write anything to mysql on writer2 at this point,
 even invalid things returns with `Deadlock`.

> > So, at that point where I'm not sure (please some Galera expert tell
> > me):
> >
> > If what happens is as I suggest, writer1's transaction is certified,
> > then that just means the lock sticks around blocking stuff on writer2,
> > but that the data is updated and it is certain that writer2's commit will
> > be rolled back. However, if it blocks waiting on the lock to resolve,
> > then I'm at a loss to determine which transaction would be rolled back,
> > but I am thinking that it makes sense that the transaction from writer2
> > would be rolled back, because the commit is later.
> 
> That is correct. writer2's transaction would be rolled back. The
> difference is that the CAS method would NOT trigger a ROLLBACK. It would
> instead return 0 rows affected, because the UPDATE statement would
> instead look like this:
> 
> UPDATE books SET genre = 'sciencefiction' WHERE id = 3 AND genre = 'SciFi';
> 
> And the return of 0 rows affected would trigger a simple retry of the
> read and then update attempt on writer2 instead of dealing with ROLLBACK
> semantics on the transaction.
> 
> Note that in the CAS method, the SELECT statement and the UPDATE are in
> completely different transactions. This is a very important thing to
> keep in mind.
> 
> > All this to say that usually the reason for SELECT FOR UPDATE is not
> > to only do an update (the transactional semantics handle that), but
> > also to prevent the old row from being seen again, which, as Jay says,
> > it cannot do.  So I believe you are both correct:
> >
> > * Attila, yes I think you're right that CAS is not any more efficient
> > at replacing SELECT FOR UPDATE from a blocking standpoint.
> 
> It is more efficient because there are far fewer ROLLBACKs of
> transactions occurring in the system.
> 
> If you look at a slow query log (with a 0 slow query time) for a MySQL
> Galera server in a multi-write cluster during a run of Tempest or Rally,
> you will notice that the number of ROLLBACK statements is extraordinary.
> AFAICR, when Peter Boros and I benchmarked a Rally launch and delete 10K
> VM run, we saw nearly 11% of *total* queries executed against the server
> were ROLLBACKs. This, in my opinion, is the main reason that the CAS
> method will show as more efficient.

You need to finish even the read-only transaction somehow before letting other
task to get the connection. You need to issue either ROLLBACK or COMMIT;
Basically it will just release your snapshot view. 

Usually connection managers/pools issuing ROLLBACKs to ensure connection
`cleanness`.

The CAS method requires to use an additional ROLLBACK.

writer1: BEGIN
writer2: BEGIN
writer1: SELECT id FROM test WHERE val is NULL limit 1; # returns with id=2
writer2: SELECT id FROM test WHERE val is NULL limit 1; # returns with id=2
writer2: UPDATE test SET val=42 WHERE id = 2 AND val is NULL; # (Rows matched: 1  Changed: 1  Warnings: 0)
writer2: COMMIT
writer1: UPDATE test SET val=42 WHERE id = 2 AND val is NULL; # (Rows matched: 0  Changed: 0  Warnings: 0)
# changed is 0, lets try to find another id

writer1: 
SELECT id FROM test WHERE val is NULL limit 1; # returns 2

Because the reads are repeatable, I get back the already used id=2.

How to get different id  with the above query (without FOR UPDATE) ?
writer1: ROLLBACK
writer1: SELECT id FROM test WHERE val is NULL limit 1; # returns id=3.

> 
> > * Jay, yes I think you're right that SELECT FOR UPDATE is not the right
> > thing to use to do such reads, because one is relying on locks that are
> > meaningless on a Galera cluster.
> >
> > Where I think the CAS ends up being the preferred method for this sort
> > of thing is where one consideres that it won't hold a meaningless lock
> > while the transaction is completed and then rolled back.
> 
> CAS is preferred because it is measurably faster and more
> obstruction-free than SELECT FOR UPDATE. A colleague of mine is almost
> ready to publish documentation showing a benchmark of this that shows
> nearly a 100% decrease in total amount of lock/wait time using CAS
> versus waiting for the coarser-level certification timeout to retry the
> transactions. As mentioned above, I believe this is due to the dramatic
> decrease in ROLLBACKs.

I tried different methods for doing similar ops,
The FOR UPDATE was slower when you forget to define the index like
https://bugs.launchpad.net/neutron/+bug/1412348

After the index was in place the FOR UPDATE was one of fastest way,
the CAS way was comparable fast only when I used strange queries like bellow,
 with 'read committed' isolation level.:

'SELECT id FROM (SELECT id FROM test WHERE used=0 LIMIT 10) AS T ORDER BY RAND() LIMIT 1'

anyway, this kind of allocations  many-many times faster compared 
to how many vm's nova can schedule per sec.
 

> Best,
> -jay
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list