[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Jay Pipes jaypipes at gmail.com
Thu Feb 5 03:56:36 UTC 2015



On 02/04/2015 07:59 PM, Angus Lees wrote:
> On Thu Feb 05 2015 at 9:02:49 AM Robert Collins
> <robertc at robertcollins.net <mailto:robertc at robertcollins.net>> wrote:
>
>     On 5 February 2015 at 10:24, Joshua Harlow <harlowja at outlook.com
>     <mailto:harlowja at outlook.com>> wrote:
>      > How interesting,
>      >
>      > Why are people using galera if it behaves like this? :-/
>
>     Because its actually fairly normal. In fact its an instance of point 7
>     on https://wiki.openstack.org/__wiki/BasicDesignTenets
>     <https://wiki.openstack.org/wiki/BasicDesignTenets> - one of our
>     oldest wiki pages :).
>
>     In more detail, consider what happens in full isolation when you have
>     the A and B example given, but B starts its transaction before A.
>
>     B BEGIN
>     A BEGIN
>     A INSERT foo
>     A COMMIT
>     B SELECT foo -> NULL
>
>
> Note that this still makes sense from each of A and B's individual view
> of the world.
>
> If I understood correctly, the big change with Galera that Matthew is
> highlighting is that read-after-write may not be consistent from the pov
> of a single thread.

No, this is not correct. There is nothing different about Galera here 
versus any asynchronously replicated database. A single thread, issuing 
statements in two entirely *separate sessions*, load-balanced across an 
entire set of database cluster nodes, may indeed see older data if the 
second session gets balanced to a slave node.

Nothing has changed about this with Galera. The exact same patterns that 
you would use to ensure that you are able to read the data that you 
previously wrote can be used with Galera. Just have the thread start a 
transactional session and ensure all queries are executed in the context 
of that session. Done. Nothing about Galera changes anything here.

> Not have read-after-write is *really* hard to code to (see for example
> x86 SMP cache coherency, C++ threading semantics, etc which all provide
> read-after-write for this reason).  This is particularly true when the
> affected operations are hidden behind an ORM - it isn't clear what might
> involve a database call and sequencers (or logical clocks, etc) aren't
> made explicit in the API.
>
> I strongly suggest just enabling wsrep_casual_reads on all galera
> sessions, unless you can guarantee that the high-level task is purely
> read-only, and then moving on to something else ;)  If we choose
> performance over correctness here then we're just signing up for lots of
> debugging of hard to reproduce race conditions, and the fixes are going
> to look like what wsrep_casual_reads does anyway.
>
> (Mind you, exposing sequencers at every API interaction would be
> awesome, and I look forward to a future framework and toolchain that
> makes that easy to do correctly)

IMHO, you all are reading WAY too much into this. The behaviour that 
Matthew is describing is the kind of thing that has been around for 
decades now with asynchronous slave replication. Applications have 
traditionally handled it by sending reads that can tolerate slave lag to 
a slave machine, and reads that cannot to the same machine that was 
written to.

Galera doesn't change anything here. I'm really not sure what the fuss 
is about, frankly.

I don't recommend mucking with wsrep_causal_reads if we don't have to. 
And, IMO, we don't have to much with it at all.

Best,
-jay



More information about the OpenStack-dev mailing list