[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Mike Bayer mbayer at redhat.com
Wed Feb 4 21:09:14 UTC 2015



Matthew Booth <mbooth at redhat.com> wrote:

> A: start transaction;
> A: insert into foo values(1)
> A: commit;
> B: select * from foo; <-- May not contain the value we inserted above[3]

I’ve confirmed in my own testing that this is accurate. the
wsrep_causal_reads flag does resolve this, and it is settable on a
per-session basis.  The attached script, adapted from the script 
given in the blog post, illustrates this.


> 
> Galera exposes a session variable which will fix this: wsrep_sync_wait
> (or wsrep_causal_reads on older mysql). However, this isn't the default.
> It presumably has a performance cost, but I don't know what it is, or
> how it scales with various workloads.

Well, consider our application is doing some @writer, then later it does
some @reader. @reader has the contract that reads must be synchronous with
any writes. Easy enough, @reader ensures that the connection it uses sets up
"set wsrep_causal_reads=1”. The attached test case confirms this is feasible
on a per-session (that is, a connection attached to the database) basis, so 
that the setting will not impact the cluster as a whole, and we can 
forego using it on those @async_reader calls where we don’t need it.

> Because these are semantic issues, they aren't things which can be
> easily guarded with an if statement. We can't say:
> 
> if galera:
>  try:
>    commit
>  except:
>    rewind time
> 
> If we are to support this DB at all, we have to structure code in the
> first place to allow for its semantics.

I think the above example is referring to the “deadlock” issue, which we have
solved both with the “only write to one master” strategy.

But overall, as you’re aware, we will no longer have the words “begin” or
“commit” in our code. This takes place all within enginefacade. With this
pattern, we will permanently end the need for any kind of repeated special
patterns or boilerplate which occurs per-transaction on a
backend-configurable basis. The enginefacade is where any such special
patterns can take place, and for extended patterns such as setting up
wsrep_causal_reads on @reader nodes or similar, we can implement a
rudimentary plugin system for it such that we can have a “galera” backend to
set up what’s needed.

The attached script does essentially what the one associated with
http://www.percona.com/blog/2013/03/03/investigating-replication-latency-in-percona-xtradb-cluster/
does. It’s valid because without wsrep_causal_reads turned on the
connection, I get plenty of reads that lag behind the writes, so I’ve
confirmed this is easily reproducible, and that with casual_reads turned on,
it vanishes. The script demonstrates that a single application can set up
“wsrep_causal_reads” on a per-session basis (remember, by “session” we mean
“a mysql session”), where it takes effect for that connection alone, not
affecting the performance of other concurrent connections even in the same
application. With the flag turned on, the script never reads a stale row.
The script illustrates calls upon both the casual reads connection and the
non-causal reads in a randomly alternating fashion. I’m running it against a
cluster of two virtual nodes on a laptop, so performance is very slow, but
some sample output:

2015-02-04 15:49:27,131 100 runs
2015-02-04 15:49:27,754 w/ non-causal reads, got row 763 val is 9499, retries 0
2015-02-04 15:49:27,760 w/ non-causal reads, got row 763 val is 9499, retries 1
2015-02-04 15:49:27,764 w/ non-causal reads, got row 763 val is 9499, retries 2
2015-02-04 15:49:27,772 w/ non-causal reads, got row 763 val is 9499, retries 3
2015-02-04 15:49:27,777 w/ non-causal reads, got row 763 val is 9499, retries 4
2015-02-04 15:49:30,985 200 runs
2015-02-04 15:49:37,579 300 runs
2015-02-04 15:49:42,396 400 runs
2015-02-04 15:49:48,240 w/ non-causal reads, got row 6544 val is 6766, retries 0
2015-02-04 15:49:48,255 w/ non-causal reads, got row 6544 val is 6766, retries 1
2015-02-04 15:49:48,276 w/ non-causal reads, got row 6544 val is 6766, retries 2
2015-02-04 15:49:49,336 500 runs
2015-02-04 15:49:56,433 600 runs
2015-02-04 15:50:05,801 700 runs
2015-02-04 15:50:08,802 w/ non-causal reads, got row 533 val is 834, retries 0
2015-02-04 15:50:10,849 800 runs
2015-02-04 15:50:14,834 900 runs
2015-02-04 15:50:15,445 w/ non-causal reads, got row 124 val is 3850, retries 0
2015-02-04 15:50:15,448 w/ non-causal reads, got row 124 val is 3850, retries 1
2015-02-04 15:50:18,515 1000 runs
2015-02-04 15:50:22,130 1100 runs
2015-02-04 15:50:26,301 1200 runs
2015-02-04 15:50:28,898 w/ non-causal reads, got row 1493 val is 8358, retries 0
2015-02-04 15:50:29,988 1300 runs
2015-02-04 15:50:33,736 1400 runs
2015-02-04 15:50:34,219 w/ non-causal reads, got row 9661 val is 2877, retries 0
2015-02-04 15:50:38,796 1500 runs
2015-02-04 15:50:42,844 1600 runs
2015-02-04 15:50:46,838 1700 runs
2015-02-04 15:50:51,049 1800 runs
2015-02-04 15:50:55,139 1900 runs
2015-02-04 15:50:59,632 2000 runs
2015-02-04 15:51:04,721 2100 runs
2015-02-04 15:51:10,670 2200 runs
2015-02-04 15:51:15,848 2300 runs
2015-02-04 15:51:20,960 2400 runs
2015-02-04 15:51:25,629 2500 runs
2015-02-04 15:51:30,747 2600 runs
2015-02-04 15:51:36,229 2700 runs
2015-02-04 15:51:39,865 w/ non-causal reads, got row 7378 val is 1571, retries 0
2015-02-04 15:51:39,869 w/ non-causal reads, got row 7378 val is 1571, retries 1
2015-02-04 15:51:39,874 w/ non-causal reads, got row 7378 val is 1571, retries 2
2015-02-04 15:51:39,880 w/ non-causal reads, got row 7378 val is 1571, retries 3
2015-02-04 15:51:39,887 w/ non-causal reads, got row 7378 val is 1571, retries 4
2015-02-04 15:51:39,892 w/ non-causal reads, got row 7378 val is 1571, retries 5
2015-02-04 15:51:40,640 2800 runs



-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_galera.py
Type: text/x-python-script
Size: 2058 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150204/7c6d5e03/attachment.bin>
-------------- next part --------------



 




More information about the OpenStack-dev mailing list