[openstack-dev] [nova] Distributed Database

Clint Byrum clint at fewbar.com
Tue May 3 20:32:47 UTC 2016

Excerpts from Mike Bayer's message of 2016-05-03 09:04:00 -0700:
> On 05/02/2016 01:48 PM, Clint Byrum wrote:
> >>
> >
> > FWIW, I agree with you. If you're going to use SQLAlchemy, use it to
> > take advantage of the relational model.
> >
> > However, how is what you describe a win? Whether you use SELECT .. FOR
> > UPDATE, or a stored procedure, the lock is not distributed, and thus, will
> > still suffer rollback failures in Galera. For single DB server setups, you
> > don't have to worry about that, and SELECT .. FOR UPDATE will work fine.
> Well it's a "win" vs. the lesser approach considered which also did not 
> include a distributed locking system like Zookeeper.   It is also a win 
> even with a Zookeeper-like system in place because it allows a SQL query 
> to be much smarter about selecting data that involves IP numbers and 
> CIDRs, without the need to pull data into memory and process it there. 
> This is the most common mistake in SQL programming, not taking advantage 
> of SQL's set-based nature and instead pulling data into memory 
> unnecessarily.

Indeed, we use relational databases so we don't have to deal with lots
of data that doesn't make sense to us at the time we want it.

> Also, the "federated MySQL" approach of Cells V2 would still be OK with 
> pessimistic locking, since this lock is not "distributed" across the 
> entire dataspace.   Only the usual Galera caveats apply, e.g. point to 
> only one galera "master" at a time and/or wait for Galera to support 
> "SELECT FOR UPDATE" across the cluster.

Right, of course it would work. It's just a ton of code for not much
improvement in scalability or resilience.

> >
> > Furthermore, any logic that happens inside the database server is extra
> > load on a much much much harder resource to scale, using code that is
> > much more complicated to update.
> So I was careful to use the term "stored function" and not "stored 
> procedure".   As ironic as it is for me to defend both the ORM 
> business-logic-in-the-application-not-the-database position, *and* the 
> let-the-database-do-things-not-the-application at the same time, using 
> database functions to allow new kinds of math and comparison operations 
> to take place over sets is entirely reasonable, and should not be 
> confused with the old-school big-business approach of building an entire 
> business logic layer as a huge wall of stored procedures, this is 
> nothing like that.

Indeed, it's a complicated and nuanced position, but I think I
understand where you're going with it. My reluctance to put intelligence
in the database is just that, reluctance, not some hard and fast rule I
can quote.

> The Postgresql database has INET and CIDR types native which include the 
> same overlap logic we are implementing here as a MySQL stored function, 
> so the addition of math functions like these shouldn't be controversial. 
>    The "load" of this function is completely negligible (however I would 
> be glad to assist in load testing it to confirm), especially compared to 
> pulling the same data across the wire, processing it in Python, then 
> sending just a tiny portion of it back again after we've extracted the 
> needle from the haystack.

It's death by 1000 paper cuts when you talk about scaling. Of course it
will be faster, but the slices of CPU on the database server are still a
limited resource, whereas slices of CPU on stateless API/conductor nodes
are virtually limitless and far cheaper to scale elastically.

> In pretty much every kind of load testing scenario we do with Openstack, 
> the actual "load" on the database barely pushes anything.   The only 
> database "resource" issue we have is Openstack using far more idle 
> connections than it should, which is on my end to work on improvements 
> to the connection pooling system which does not scale well across 
> Openstack's tons-of-processes model.

Indeed, pooling is something we should improve upon. But even more, we
need to improve upon error handling and resilience.

> >
> > To be clear, it's not the amount of data, but the size of the failure
> > domain. We're more worried about what will happen to those 40,000 open
> > connections from our 4000 servers when we do have to violently move them.
> That's a really big number and I will admit I would need to dig into 
> this particular problem domain more deeply to understand what exactly 
> the rationale of that kind of scale would be here.   But it does seem 
> like if you were using SQL databases, and the 4000 server system is in 
> fact grouped into hundreds of "silos" that only deal with strict 
> segments of the total dataspace, a federated approach would be exactly 
> what you'd want to go with.

I think it would be easier to just make sure the client code handles
connection failovers gracefully than try to figure out sharding.

> >
> > That particular problem isn't as scary if you have a large
> > Cassandra/MongoDB/Riak/ROME cluster, as the client libraries are
> > generally connecting to all or most of the nodes already, and will
> > simply use a different connection if the initial one fails. However,
> > these other systems also bring a whole host of new problems which the
> > simpler SQL approach doesn't have.
> Regarding ROME, I only seek to make the point that if you're going to 
> switch to NoSQL, you have to switch to NoSQL.   Bolting SQLAlchemy on 
> top of Redis without a mature and widely-proven relational layer in 
> between, down to the level of replicating the actual tables that were 
> built within a relational schema, is a denial of the reality of the 
> problem to be solved.

We so agree on this. :)

More information about the OpenStack-dev mailing list