[openstack-dev] [nova] Distributed Database

Clint Byrum clint at fewbar.com
Mon May 2 17:48:09 UTC 2016

Excerpts from Mike Bayer's message of 2016-05-02 08:51:58 -0700:
> Well IMO that's actually often a problem.  My goal across Openstack 
> projects in general is to allow them to make use of SQL more effectively 
> than they do right now; for example, in Neutron I am helping them to 
> move a block of code that inefficiently needs to load a block of data 
> into memory, scan it for CIDR overlaps, and then push data back out. 
> This approach prevents it from performing a single UPDATE statement and 
> ushers in the need for pessimistic locking against concurrent 
> transactions.  Instead, I've written for them a simple stored function 
> proof-of-concept [2] that will allow the entire operation to be 
> performed on the database side alone in a single statement.  Wins like 
> these are much less feasible if not impossible when a project decides it 
> wants to split its backend store between dramatically different 
> databases which don't offer such features.

FWIW, I agree with you. If you're going to use SQLAlchemy, use it to
take advantage of the relational model.

However, how is what you describe a win? Whether you use SELECT .. FOR
UPDATE, or a stored procedure, the lock is not distributed, and thus, will
still suffer rollback failures in Galera. For single DB server setups, you
don't have to worry about that, and SELECT .. FOR UPDATE will work fine.

So to me, this is something where you need a distributed locking system
(ala ZooKeeper) to actually solve the problem for multiple database

Furthermore, any logic that happens inside the database server is extra
load on a much much much harder resource to scale, using code that is
much more complicated to update. For those reasons I'm generally opposed
to using any kind of stored procedures in large scale systems. It's the
same reason I dislike foreign key enforcement: you're expending a limited
resource to mitigate a problem which _can_ be controlled and addressed
with non-stateful resources that are easier and simpler to scale.

> >
> > Concretely, we think that there are three possible approaches:
> >      1) We can use the SQLAlchemy API as the common denominator between a relational and non-relational implementation of the db.api component. These two implementation could continue to converge by sharing a large amount of code.
> >      2) We create a new non-relational implementation (from scratch) of the db.api component. It would require probably more work.
> >      3) We are also studying a last alternative: writing a SQLAlchemy engine that targets NewSQL databases (scalability + ACID):
> >       - https://github.com/cockroachdb/cockroach
> >       - https://github.com/pingcap/tidb
> Going with a NewSQL backend is by far the best approach here.   That 
> way, very little needs to be reinvented and the application's approach 
> to data doesn't need to dramatically change.
> But also, w.r.t. Cells there seems to be some remaining debate over why 
> exactly a distributed approach is even needed.  As others have posted, a 
> single MySQL database, replicated across Galera or not, scales just fine 
> for far more data than Nova ever needs to store.  So it's not clear why 
> the need for a dramatic rewrite of its datastore is called for.

To be clear, it's not the amount of data, but the size of the failure
domain. We're more worried about what will happen to those 40,000 open
connections from our 4000 servers when we do have to violently move them.

That particular problem isn't as scary if you have a large
Cassandra/MongoDB/Riak/ROME cluster, as the client libraries are
generally connecting to all or most of the nodes already, and will
simply use a different connection if the initial one fails. However,
these other systems also bring a whole host of new problems which the
simpler SQL approach doesn't have.

So it's worth doing an actual analysis of the failure handling before
jumping to the conclusion that a pile of cells/sharding code or a rewrite
to use a distributed database would be of benefit.

More information about the OpenStack-dev mailing list