[openstack-dev] [all][oslo.db][nova] TL; DR Things everybody should know about Galera

Mike Bayer mbayer at redhat.com
Thu Feb 5 04:30:45 UTC 2015



Jay Pipes <jaypipes at gmail.com> wrote:

> No, this is not correct. There is nothing different about Galera here versus any asynchronously replicated database. A single thread, issuing statements in two entirely *separate sessions*, load-balanced across an entire set of database cluster nodes, may indeed see older data if the second session gets balanced to a slave node.

That’s what we’re actually talking about.   We’re talking about “reader” methods that aren’t enclosed in a “writer” potentially being pointed at the cluster as a whole.

> 
> Nothing has changed about this with Galera. The exact same patterns that you would use to ensure that you are able to read the data that you previously wrote can be used with Galera. Just have the thread start a transactional session and ensure all queries are executed in the context of that session. Done. Nothing about Galera changes anything here.

Right but, what I’m trying to get a handle on is, how often do we make a series of RPC calls at an openstack service, where each one (because they are separate calls) are all in different transactions, and then how many of those are RPC calls that are “read-only” (and therefore we’d like to point at the cluster as a whole) are dependent on a “writer” RPC call that just happened immediately preceding?

> 
> IMHO, you all are reading WAY too much into this. The behaviour that Matthew is describing is the kind of thing that has been around for decades now with asynchronous slave replication. Applications have traditionally handled it by sending reads that can tolerate slave lag to a slave machine, and reads that cannot to the same machine that was written to.

Can we identify methods in Openstack, and particularly Nova, that are reads that can tolerate slave lag?  Or is the thing architected such that “no, pretty much 95% of reader calls, we have no idea if they occur right after a write that they are definitely dependent on” ?    Matthew found a small handful in one little corner of Nova, some kind of background thread thing, which make use of the “use_slave” flag.  But the rest of it, nope.  


> Galera doesn't change anything here. I'm really not sure what the fuss is about, frankly.

because we’re trying to get Galera to actually work as a load balanced cluster to some degree, at least for reads.

Otherwise I’m not really sure why we have to bother with Galera at all.  If we just want a single MySQL server that has a warm standby for failover, why aren’t we just using that capability straight from MySQL.  Then we get “SELECT FOR UPDATE” and everything else back.    Galera’s “multi master” capability is already in the trash for us, and it seems like “multi-slave” is only marginally useful either, the vast majority of openstack has to be 100% pointed at just one node to work correctly.

I’m coming here with the disadvantage that I don’t have a clear picture of the actual use patterns we really need.    The picture I have right now is of a Nova / Neutron etc. that receive dozens/hundreds of tiny RPC calls each of which do some small thing in its own transaction, yet most are dependent on each other as they are all part of a single larger operation, and that the whole thing runs too slowly.   But this is the fuzziest picture ever.    





More information about the OpenStack-dev mailing list