[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
Attila Fazekas
afazekas at redhat.com
Tue Feb 10 10:46:53 UTC 2015
----- Original Message -----
> From: "Jay Pipes" <jaypipes at gmail.com>
> To: openstack-dev at lists.openstack.org
> Sent: Monday, February 9, 2015 9:36:45 PM
> Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
>
> On 02/09/2015 03:10 PM, Clint Byrum wrote:
> > Excerpts from Jay Pipes's message of 2015-02-09 10:15:10 -0800:
> >> On 02/09/2015 01:02 PM, Attila Fazekas wrote:
> >>> I do not see why not to use `FOR UPDATE` even with multi-writer or
> >>> Is the retry/swap way really solves anything here.
> >> <snip>
> >>> Am I missed something ?
> >>
> >> Yes. Galera does not replicate the (internal to InnnoDB) row-level locks
> >> that are needed to support SELECT FOR UPDATE statements across multiple
> >> cluster nodes.
> >>
> >> https://groups.google.com/forum/#!msg/codership-team/Au1jVFKQv8o/QYV_Z_t5YAEJ
> >
> > Attila acknowledged that. What Attila was saying was that by using it
> > with Galera, the box that is doing the FOR UPDATE locks will simply fail
> > upon commit because a conflicting commit has already happened and arrived
> > from the node that accepted the write. Further what Attila is saying is
> > that this means there is not such an obvious advantage to the CAS method,
> > since the rollback and the # updated rows == 0 are effectively equivalent
> > at this point, seeing as the prior commit has already arrived and thus
> > will not need to wait to fail certification and be rolled back.
>
> No, that is not correct. In the case of the CAS technique, the frequency
> of rollbacks due to certification failure is demonstrably less than when
> using SELECT FOR UPDATE and relying on the certification timeout error
> to signal a deadlock.
>
> > I am not entirely certain that is true though, as I think what will
> > happen in sequential order is:
> >
> > writer1: UPDATE books SET genre = 'Scifi' WHERE genre = 'sciencefiction';
> > writer1: --> send in-progress update to cluster
> > writer2: SELECT FOR UPDATE books WHERE id=3;
> > writer1: COMMIT
> > writer1: --> try to certify commit in cluster
> > ** Here is where I stop knowing for sure what happens **
> > writer2: certifies writer1's transaction or blocks?
>
> It will certify writer1's transaction. It will only block another thread
> hitting writer2 requesting write locks or write-intent read locks on the
> same records.
>
> > writer2: UPDATE books SET genre = 'sciencefiction' WHERE id=3;
> > writer2: COMMIT --> One of them is rolled back.
> >
The other transaction can be rolled back before you do an actual commit:
writer1: BEGIN
writer2: BEGIN
writer1: update test set val=42 where id=1;
writer2: update test set val=42 where id=1;
writer1: COMMIT
writer2: show variables;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
As you can see 2th transaction failed without issuing a COMMIT after the 1th one committed.
You could write anything to mysql on writer2 at this point,
even invalid things returns with `Deadlock`.
> > So, at that point where I'm not sure (please some Galera expert tell
> > me):
> >
> > If what happens is as I suggest, writer1's transaction is certified,
> > then that just means the lock sticks around blocking stuff on writer2,
> > but that the data is updated and it is certain that writer2's commit will
> > be rolled back. However, if it blocks waiting on the lock to resolve,
> > then I'm at a loss to determine which transaction would be rolled back,
> > but I am thinking that it makes sense that the transaction from writer2
> > would be rolled back, because the commit is later.
>
> That is correct. writer2's transaction would be rolled back. The
> difference is that the CAS method would NOT trigger a ROLLBACK. It would
> instead return 0 rows affected, because the UPDATE statement would
> instead look like this:
>
> UPDATE books SET genre = 'sciencefiction' WHERE id = 3 AND genre = 'SciFi';
>
> And the return of 0 rows affected would trigger a simple retry of the
> read and then update attempt on writer2 instead of dealing with ROLLBACK
> semantics on the transaction.
>
> Note that in the CAS method, the SELECT statement and the UPDATE are in
> completely different transactions. This is a very important thing to
> keep in mind.
>
> > All this to say that usually the reason for SELECT FOR UPDATE is not
> > to only do an update (the transactional semantics handle that), but
> > also to prevent the old row from being seen again, which, as Jay says,
> > it cannot do. So I believe you are both correct:
> >
> > * Attila, yes I think you're right that CAS is not any more efficient
> > at replacing SELECT FOR UPDATE from a blocking standpoint.
>
> It is more efficient because there are far fewer ROLLBACKs of
> transactions occurring in the system.
>
> If you look at a slow query log (with a 0 slow query time) for a MySQL
> Galera server in a multi-write cluster during a run of Tempest or Rally,
> you will notice that the number of ROLLBACK statements is extraordinary.
> AFAICR, when Peter Boros and I benchmarked a Rally launch and delete 10K
> VM run, we saw nearly 11% of *total* queries executed against the server
> were ROLLBACKs. This, in my opinion, is the main reason that the CAS
> method will show as more efficient.
You need to finish even the read-only transaction somehow before letting other
task to get the connection. You need to issue either ROLLBACK or COMMIT;
Basically it will just release your snapshot view.
Usually connection managers/pools issuing ROLLBACKs to ensure connection
`cleanness`.
The CAS method requires to use an additional ROLLBACK.
writer1: BEGIN
writer2: BEGIN
writer1: SELECT id FROM test WHERE val is NULL limit 1; # returns with id=2
writer2: SELECT id FROM test WHERE val is NULL limit 1; # returns with id=2
writer2: UPDATE test SET val=42 WHERE id = 2 AND val is NULL; # (Rows matched: 1 Changed: 1 Warnings: 0)
writer2: COMMIT
writer1: UPDATE test SET val=42 WHERE id = 2 AND val is NULL; # (Rows matched: 0 Changed: 0 Warnings: 0)
# changed is 0, lets try to find another id
writer1:
SELECT id FROM test WHERE val is NULL limit 1; # returns 2
Because the reads are repeatable, I get back the already used id=2.
How to get different id with the above query (without FOR UPDATE) ?
writer1: ROLLBACK
writer1: SELECT id FROM test WHERE val is NULL limit 1; # returns id=3.
>
> > * Jay, yes I think you're right that SELECT FOR UPDATE is not the right
> > thing to use to do such reads, because one is relying on locks that are
> > meaningless on a Galera cluster.
> >
> > Where I think the CAS ends up being the preferred method for this sort
> > of thing is where one consideres that it won't hold a meaningless lock
> > while the transaction is completed and then rolled back.
>
> CAS is preferred because it is measurably faster and more
> obstruction-free than SELECT FOR UPDATE. A colleague of mine is almost
> ready to publish documentation showing a benchmark of this that shows
> nearly a 100% decrease in total amount of lock/wait time using CAS
> versus waiting for the coarser-level certification timeout to retry the
> transactions. As mentioned above, I believe this is due to the dramatic
> decrease in ROLLBACKs.
I tried different methods for doing similar ops,
The FOR UPDATE was slower when you forget to define the index like
https://bugs.launchpad.net/neutron/+bug/1412348
After the index was in place the FOR UPDATE was one of fastest way,
the CAS way was comparable fast only when I used strange queries like bellow,
with 'read committed' isolation level.:
'SELECT id FROM (SELECT id FROM test WHERE used=0 LIMIT 10) AS T ORDER BY RAND() LIMIT 1'
anyway, this kind of allocations many-many times faster compared
to how many vm's nova can schedule per sec.
> Best,
> -jay
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list