Open Stack

Sun Apr 24 04:33:03 UTC 2016

On 04/22/2016 04:27 PM, Ed Leafe wrote:
> OK, so I know that Friday afternoons are usually the worst times to
> write a blog post and start an email discussion, and that the Friday
> immediately before a Summit is the absolute worst, but I did it anyway.
>
> http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/
>
> Summary: we are creating way too much complexity by trying to make Nova
> handle things that are best handled by a distributed database. The
> recent split of the Nova DB into an API database and separate cell
> databases is the glaring example of going down the wrong road.
>
> Anyway, read it on your flight (or, in my case, drive) to Austin, and
> feel free to pull me aside to explain just how wrong I am. ;-)

Distributed databases aren't mutually exclusive against SQL databases. 
   I am only vaguely familiar with Cells and how it divides up data into 
entirely different databases of the same schema, and perhaps it wasn't 
executed well, however a discussion like this would need to separate the 
concept of "distributed" from the notion that "that means we need a 
database that advertises itself as distributed!".

The general problem Cells is solving strikes me very much as a 
traditional horizontal sharding problem.  While key stores like to 
advertise that cross-database sharding is very easy with plain 
key/values, that's at the expense of the enormous amount of 
functionality you give up, including ACID and the relational model. 
There's no reason you can't horizontally shard a relational database, 
and while Cells seems like it's made this approach somewhat rigid, it 
doesn't have to be that way.   SQLAlchemy has long had a horizontal 
sharding extension and relational databases like Postgresql also include 
horizontal sharding structures built in (see 
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html).  If 
you shard your data into compartments the way Cells does, you can still 
pretty much keep ACID local to one database at a time, or if you want to 
distribute a transaction you can use two phase commit which MySQL and 
Postgresql both support.

A key reason the NoSQL movement failed to completely replace relational 
databases as its advocates seemed to think would happen about five years 
ago, was that they spent lots of time claiming to solve problems in SQL 
that weren't actually problems, such as the idea that "schemaless" is 
easier to work with (there's always a schema, NoSQL just has no way of 
validating or enforcing it), or that you just couldn't do key/value 
transactions nearly as fast with ACID (until Postgresql made a few 
tweaks and successfully beats MongoDB at this task now).

It may or may not be the case that "Cells didn't do a very good job of 
distributing SQL" but that doesn't mean "SQL is not appropriate for 
distributing data".   Facebook and LinkedIn have built distributed 
database systems based on MySQL at profoundly massive scales. 
Openstack's problem I'm going to guess isn't as hard as that.

>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Open Stack

[openstack-dev] [nova] Distributed Database

OpenStack

Community

Documentation

Branding & Legal