[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera
Joshua Harlow
harlowja at outlook.com
Wed Feb 4 21:24:20 UTC 2015
How interesting,
Why are people using galera if it behaves like this? :-/
Are the people that are using it know/aware that this happens? :-/
Scary....
Mike Bayer wrote:
>
> Matthew Booth<mbooth at redhat.com> wrote:
>
>> A: start transaction;
>> A: insert into foo values(1)
>> A: commit;
>> B: select * from foo;<-- May not contain the value we inserted above[3]
>
> I’ve confirmed in my own testing that this is accurate. the
> wsrep_causal_reads flag does resolve this, and it is settable on a
> per-session basis. The attached script, adapted from the script
> given in the blog post, illustrates this.
>
>
>> Galera exposes a session variable which will fix this: wsrep_sync_wait
>> (or wsrep_causal_reads on older mysql). However, this isn't the default.
>> It presumably has a performance cost, but I don't know what it is, or
>> how it scales with various workloads.
>
> Well, consider our application is doing some @writer, then later it does
> some @reader. @reader has the contract that reads must be synchronous with
> any writes. Easy enough, @reader ensures that the connection it uses sets up
> "set wsrep_causal_reads=1”. The attached test case confirms this is feasible
> on a per-session (that is, a connection attached to the database) basis, so
> that the setting will not impact the cluster as a whole, and we can
> forego using it on those @async_reader calls where we don’t need it.
>
>> Because these are semantic issues, they aren't things which can be
>> easily guarded with an if statement. We can't say:
>>
>> if galera:
>> try:
>> commit
>> except:
>> rewind time
>>
>> If we are to support this DB at all, we have to structure code in the
>> first place to allow for its semantics.
>
> I think the above example is referring to the “deadlock” issue, which we have
> solved both with the “only write to one master” strategy.
>
> But overall, as you’re aware, we will no longer have the words “begin” or
> “commit” in our code. This takes place all within enginefacade. With this
> pattern, we will permanently end the need for any kind of repeated special
> patterns or boilerplate which occurs per-transaction on a
> backend-configurable basis. The enginefacade is where any such special
> patterns can take place, and for extended patterns such as setting up
> wsrep_causal_reads on @reader nodes or similar, we can implement a
> rudimentary plugin system for it such that we can have a “galera” backend to
> set up what’s needed.
>
> The attached script does essentially what the one associated with
> http://www.percona.com/blog/2013/03/03/investigating-replication-latency-in-percona-xtradb-cluster/
> does. It’s valid because without wsrep_causal_reads turned on the
> connection, I get plenty of reads that lag behind the writes, so I’ve
> confirmed this is easily reproducible, and that with casual_reads turned on,
> it vanishes. The script demonstrates that a single application can set up
> “wsrep_causal_reads” on a per-session basis (remember, by “session” we mean
> “a mysql session”), where it takes effect for that connection alone, not
> affecting the performance of other concurrent connections even in the same
> application. With the flag turned on, the script never reads a stale row.
> The script illustrates calls upon both the casual reads connection and the
> non-causal reads in a randomly alternating fashion. I’m running it against a
> cluster of two virtual nodes on a laptop, so performance is very slow, but
> some sample output:
>
> 2015-02-04 15:49:27,131 100 runs
> 2015-02-04 15:49:27,754 w/ non-causal reads, got row 763 val is 9499, retries 0
> 2015-02-04 15:49:27,760 w/ non-causal reads, got row 763 val is 9499, retries 1
> 2015-02-04 15:49:27,764 w/ non-causal reads, got row 763 val is 9499, retries 2
> 2015-02-04 15:49:27,772 w/ non-causal reads, got row 763 val is 9499, retries 3
> 2015-02-04 15:49:27,777 w/ non-causal reads, got row 763 val is 9499, retries 4
> 2015-02-04 15:49:30,985 200 runs
> 2015-02-04 15:49:37,579 300 runs
> 2015-02-04 15:49:42,396 400 runs
> 2015-02-04 15:49:48,240 w/ non-causal reads, got row 6544 val is 6766, retries 0
> 2015-02-04 15:49:48,255 w/ non-causal reads, got row 6544 val is 6766, retries 1
> 2015-02-04 15:49:48,276 w/ non-causal reads, got row 6544 val is 6766, retries 2
> 2015-02-04 15:49:49,336 500 runs
> 2015-02-04 15:49:56,433 600 runs
> 2015-02-04 15:50:05,801 700 runs
> 2015-02-04 15:50:08,802 w/ non-causal reads, got row 533 val is 834, retries 0
> 2015-02-04 15:50:10,849 800 runs
> 2015-02-04 15:50:14,834 900 runs
> 2015-02-04 15:50:15,445 w/ non-causal reads, got row 124 val is 3850, retries 0
> 2015-02-04 15:50:15,448 w/ non-causal reads, got row 124 val is 3850, retries 1
> 2015-02-04 15:50:18,515 1000 runs
> 2015-02-04 15:50:22,130 1100 runs
> 2015-02-04 15:50:26,301 1200 runs
> 2015-02-04 15:50:28,898 w/ non-causal reads, got row 1493 val is 8358, retries 0
> 2015-02-04 15:50:29,988 1300 runs
> 2015-02-04 15:50:33,736 1400 runs
> 2015-02-04 15:50:34,219 w/ non-causal reads, got row 9661 val is 2877, retries 0
> 2015-02-04 15:50:38,796 1500 runs
> 2015-02-04 15:50:42,844 1600 runs
> 2015-02-04 15:50:46,838 1700 runs
> 2015-02-04 15:50:51,049 1800 runs
> 2015-02-04 15:50:55,139 1900 runs
> 2015-02-04 15:50:59,632 2000 runs
> 2015-02-04 15:51:04,721 2100 runs
> 2015-02-04 15:51:10,670 2200 runs
> 2015-02-04 15:51:15,848 2300 runs
> 2015-02-04 15:51:20,960 2400 runs
> 2015-02-04 15:51:25,629 2500 runs
> 2015-02-04 15:51:30,747 2600 runs
> 2015-02-04 15:51:36,229 2700 runs
> 2015-02-04 15:51:39,865 w/ non-causal reads, got row 7378 val is 1571, retries 0
> 2015-02-04 15:51:39,869 w/ non-causal reads, got row 7378 val is 1571, retries 1
> 2015-02-04 15:51:39,874 w/ non-causal reads, got row 7378 val is 1571, retries 2
> 2015-02-04 15:51:39,880 w/ non-causal reads, got row 7378 val is 1571, retries 3
> 2015-02-04 15:51:39,887 w/ non-causal reads, got row 7378 val is 1571, retries 4
> 2015-02-04 15:51:39,892 w/ non-causal reads, got row 7378 val is 1571, retries 5
> 2015-02-04 15:51:40,640 2800 runs
>
>
>
>
>
>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list