[openstack-dev] [nova] Distributed Database
Mike Bayer
mbayer at redhat.com
Sun Apr 24 04:33:03 UTC 2016
On 04/22/2016 04:27 PM, Ed Leafe wrote:
> OK, so I know that Friday afternoons are usually the worst times to
> write a blog post and start an email discussion, and that the Friday
> immediately before a Summit is the absolute worst, but I did it anyway.
>
> http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/
>
> Summary: we are creating way too much complexity by trying to make Nova
> handle things that are best handled by a distributed database. The
> recent split of the Nova DB into an API database and separate cell
> databases is the glaring example of going down the wrong road.
>
> Anyway, read it on your flight (or, in my case, drive) to Austin, and
> feel free to pull me aside to explain just how wrong I am. ;-)
Distributed databases aren't mutually exclusive against SQL databases.
I am only vaguely familiar with Cells and how it divides up data into
entirely different databases of the same schema, and perhaps it wasn't
executed well, however a discussion like this would need to separate the
concept of "distributed" from the notion that "that means we need a
database that advertises itself as distributed!".
The general problem Cells is solving strikes me very much as a
traditional horizontal sharding problem. While key stores like to
advertise that cross-database sharding is very easy with plain
key/values, that's at the expense of the enormous amount of
functionality you give up, including ACID and the relational model.
There's no reason you can't horizontally shard a relational database,
and while Cells seems like it's made this approach somewhat rigid, it
doesn't have to be that way. SQLAlchemy has long had a horizontal
sharding extension and relational databases like Postgresql also include
horizontal sharding structures built in (see
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html). If
you shard your data into compartments the way Cells does, you can still
pretty much keep ACID local to one database at a time, or if you want to
distribute a transaction you can use two phase commit which MySQL and
Postgresql both support.
A key reason the NoSQL movement failed to completely replace relational
databases as its advocates seemed to think would happen about five years
ago, was that they spent lots of time claiming to solve problems in SQL
that weren't actually problems, such as the idea that "schemaless" is
easier to work with (there's always a schema, NoSQL just has no way of
validating or enforcing it), or that you just couldn't do key/value
transactions nearly as fast with ACID (until Postgresql made a few
tweaks and successfully beats MongoDB at this task now).
It may or may not be the case that "Cells didn't do a very good job of
distributing SQL" but that doesn't mean "SQL is not appropriate for
distributing data". Facebook and LinkedIn have built distributed
database systems based on MySQL at profoundly massive scales.
Openstack's problem I'm going to guess isn't as hard as that.
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list