[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Clint Byrum clint at fewbar.com
Thu Feb 5 17:22:58 UTC 2015


Excerpts from Angus Lees's message of 2015-02-04 16:59:31 -0800:
> On Thu Feb 05 2015 at 9:02:49 AM Robert Collins <robertc at robertcollins.net>
> wrote:
> 
> > On 5 February 2015 at 10:24, Joshua Harlow <harlowja at outlook.com> wrote:
> > > How interesting,
> > >
> > > Why are people using galera if it behaves like this? :-/
> >
> > Because its actually fairly normal. In fact its an instance of point 7
> > on https://wiki.openstack.org/wiki/BasicDesignTenets - one of our
> > oldest wiki pages :).
> >
> > In more detail, consider what happens in full isolation when you have
> > the A and B example given, but B starts its transaction before A.
> >
> > B BEGIN
> > A BEGIN
> > A INSERT foo
> > A COMMIT
> > B SELECT foo -> NULL
> >
> 
> Note that this still makes sense from each of A and B's individual view of
> the world.
> 
> If I understood correctly, the big change with Galera that Matthew is
> highlighting is that read-after-write may not be consistent from the pov of
> a single thread.
> 

No that's not a complete picture.

What Matthew is highlighting is that after a commit, a new transaction
may not see the write if it is done on a separate node in the cluster.

In a single thread, using a single database session, then a read after
successful commit is guaranteed to read a version of the database
that existed after that commit. What it may not be consistent with is
subsequent writes which may have happened after the commit on other
servers, unless you use the sync wait.

> Not have read-after-write is *really* hard to code to (see for example x86
> SMP cache coherency, C++ threading semantics, etc which all provide
> read-after-write for this reason).  This is particularly true when the
> affected operations are hidden behind an ORM - it isn't clear what might
> involve a database call and sequencers (or logical clocks, etc) aren't made
> explicit in the API.
> 
> I strongly suggest just enabling wsrep_casual_reads on all galera sessions,
> unless you can guarantee that the high-level task is purely read-only, and
> then moving on to something else ;)  If we choose performance over
> correctness here then we're just signing up for lots of debugging of hard
> to reproduce race conditions, and the fixes are going to look like what
> wsrep_casual_reads does anyway.
> 
> (Mind you, exposing sequencers at every API interaction would be awesome,
> and I look forward to a future framework and toolchain that makes that easy
> to do correctly)
> 

I'd like to see actual examples where that will matter. Meanwhile making
all selects wait for the cluster will basically just ruin responsiveness
and waste tons of time, so we should be careful to think this through
before making any blanket policy.

I'd also like to see consideration given to systems that handle
distributed consistency in a more active manner. etcd and Zookeeper are
both such systems, and might serve as efficient guards for critical
sections without raising latency.



More information about the OpenStack-dev mailing list