[openstack-dev] [nova] Distributed Database

Clint Byrum clint at fewbar.com
Tue May 3 20:32:47 UTC 2016


Excerpts from Mike Bayer's message of 2016-05-03 09:04:00 -0700:
> 
> On 05/02/2016 01:48 PM, Clint Byrum wrote:
> >>
> >
> > FWIW, I agree with you. If you're going to use SQLAlchemy, use it to
> > take advantage of the relational model.
> >
> > However, how is what you describe a win? Whether you use SELECT .. FOR
> > UPDATE, or a stored procedure, the lock is not distributed, and thus, will
> > still suffer rollback failures in Galera. For single DB server setups, you
> > don't have to worry about that, and SELECT .. FOR UPDATE will work fine.
> 
> Well it's a "win" vs. the lesser approach considered which also did not 
> include a distributed locking system like Zookeeper.   It is also a win 
> even with a Zookeeper-like system in place because it allows a SQL query 
> to be much smarter about selecting data that involves IP numbers and 
> CIDRs, without the need to pull data into memory and process it there. 
> This is the most common mistake in SQL programming, not taking advantage 
> of SQL's set-based nature and instead pulling data into memory 
> unnecessarily.
> 

Indeed, we use relational databases so we don't have to deal with lots
of data that doesn't make sense to us at the time we want it.

> Also, the "federated MySQL" approach of Cells V2 would still be OK with 
> pessimistic locking, since this lock is not "distributed" across the 
> entire dataspace.   Only the usual Galera caveats apply, e.g. point to 
> only one galera "master" at a time and/or wait for Galera to support 
> "SELECT FOR UPDATE" across the cluster.
> 

Right, of course it would work. It's just a ton of code for not much
improvement in scalability or resilience.

> >
> > Furthermore, any logic that happens inside the database server is extra
> > load on a much much much harder resource to scale, using code that is
> > much more complicated to update.
> 
> So I was careful to use the term "stored function" and not "stored 
> procedure".   As ironic as it is for me to defend both the ORM 
> business-logic-in-the-application-not-the-database position, *and* the 
> let-the-database-do-things-not-the-application at the same time, using 
> database functions to allow new kinds of math and comparison operations 
> to take place over sets is entirely reasonable, and should not be 
> confused with the old-school big-business approach of building an entire 
> business logic layer as a huge wall of stored procedures, this is 
> nothing like that.
> 

Indeed, it's a complicated and nuanced position, but I think I
understand where you're going with it. My reluctance to put intelligence
in the database is just that, reluctance, not some hard and fast rule I
can quote.

> The Postgresql database has INET and CIDR types native which include the 
> same overlap logic we are implementing here as a MySQL stored function, 
> so the addition of math functions like these shouldn't be controversial. 
>    The "load" of this function is completely negligible (however I would 
> be glad to assist in load testing it to confirm), especially compared to 
> pulling the same data across the wire, processing it in Python, then 
> sending just a tiny portion of it back again after we've extracted the 
> needle from the haystack.
> 

It's death by 1000 paper cuts when you talk about scaling. Of course it
will be faster, but the slices of CPU on the database server are still a
limited resource, whereas slices of CPU on stateless API/conductor nodes
are virtually limitless and far cheaper to scale elastically.

> In pretty much every kind of load testing scenario we do with Openstack, 
> the actual "load" on the database barely pushes anything.   The only 
> database "resource" issue we have is Openstack using far more idle 
> connections than it should, which is on my end to work on improvements 
> to the connection pooling system which does not scale well across 
> Openstack's tons-of-processes model.
> 

Indeed, pooling is something we should improve upon. But even more, we
need to improve upon error handling and resilience.

> >
> > To be clear, it's not the amount of data, but the size of the failure
> > domain. We're more worried about what will happen to those 40,000 open
> > connections from our 4000 servers when we do have to violently move them.
> 
> That's a really big number and I will admit I would need to dig into 
> this particular problem domain more deeply to understand what exactly 
> the rationale of that kind of scale would be here.   But it does seem 
> like if you were using SQL databases, and the 4000 server system is in 
> fact grouped into hundreds of "silos" that only deal with strict 
> segments of the total dataspace, a federated approach would be exactly 
> what you'd want to go with.
> 

I think it would be easier to just make sure the client code handles
connection failovers gracefully than try to figure out sharding.

> >
> > That particular problem isn't as scary if you have a large
> > Cassandra/MongoDB/Riak/ROME cluster, as the client libraries are
> > generally connecting to all or most of the nodes already, and will
> > simply use a different connection if the initial one fails. However,
> > these other systems also bring a whole host of new problems which the
> > simpler SQL approach doesn't have.
> 
> Regarding ROME, I only seek to make the point that if you're going to 
> switch to NoSQL, you have to switch to NoSQL.   Bolting SQLAlchemy on 
> top of Redis without a mature and widely-proven relational layer in 
> between, down to the level of replicating the actual tables that were 
> built within a relational schema, is a denial of the reality of the 
> problem to be solved.
> 

We so agree on this. :)



More information about the OpenStack-dev mailing list