[openstack-dev] [nova] Distributed Database

Robert Collins robertc at robertcollins.net
Sun Apr 24 20:28:45 UTC 2016

On 24 April 2016 at 12:15, Edward Leafe <ed at leafe.com> wrote:
> On Apr 24, 2016, at 6:41 AM, Michael Bayer <mbayer at redhat.com> wrote:
>> I'm only seeking to counter what appears to be the premise of your blog post, which is that distributed and SQL are mutually exclusive.   When you say, "why don't we just query the database?"  You can make distributed SQL look like that to the application as well, but it might not bring any advantages.
> That’s my fault for the confusion. My problem with SQL was in the modeling of disparate resources, each of which has all sorts of different qualities, into a single relational model. I fully understand that SQL DBs can have a distributed model like that of key-value stores or column family stores.

It is important to remember that there are five really different
challenges we're dealing with here:
 - correctness
 - extensability / reuse within the OpenStack ecosystem. See the
discussion on quota's for a recent thread on that.
 - latency optimisation of APIs. - e.g. get all our APIs to respond
w/in 5ms 95% of the time on a mid-sized cloud.
 - scale optimisation of APIs - keep latency of APIs near-constant as
the number of services comprising a cloud scales up
 - reliability - keeping the cloud basically available even when
hardware or software failure are occuring (short of catastrophic ones

Solving each of those challenges is essentially domain specific
knowledge - data correctness with SQL vs a k:v store vs a column
familty style store vs a document database store are all different in

I believe it is possible to solve all of these with single-node SQL
servers, but you need to bring in layers on top of it - and the cited
big examples that have done this have done exactly that.

It is also possible to solve them using a distributed database -
cassandra/riak/<anythingbutmongodb>/... - and plenty of examples exist
of that being done as well.

There are also things like
https://github.com/apache/incubator-trafodion /
- SQL distributed on Hadoop. I suspect it is at least partially
inspired by http://research.google.com/archive/spanner.html .

Fundamentally though, there are lots of ways to solve for the 5 things
above, and right now we're not systematically doing *any* of them.

In a LEAN sense, it would be good to push back actually deciding which
of the mutually conflicting approaches we might choose until we need
to decide - but I'm not sure how best to structure for that.

Some of the ways that come to mind for addressing some / all of these
issues - some of which we have folk working on bit by bit...

 - we can fix up our existing data store: tune the queries, tune the
schema, tune RPC traffic, such that we can run at (say) 100x our
existing scale, tell folk to depend on Galera for reliability.
 - we can mix into that sharding for scale, presuming we *have* a
scale problem once we're using the DB well
 - we can put robust instrumentation into place to tell us where we
have gotten do w.r.t latency and performance
 - we can split out the best-effort, RPC, and must-deliver message bus
traffic and optimise those separately
 - we could move our data aggregation and reliable notification stuff
to a system like kafka
(https://engineering.linkedin.com/kafka/running-kafka-scale )- note
that that is linkedin, who also run SQL still
 - we could look at a microservice architecture to really move ahead
on splitting out things we've had stall in the past (scheduling,
quotas obviously)

I think its worth noting that correctness gets more complex rapidly as
you discard features from lower down the stack: e.g. SQL gives you
ACID etc. Sharded SQL means we need a BASE approach and much more care
about sequencing of operations. Other databases give varying sets of
transactional capability, and more and more care/BASE tooling becomes

We also end up looking at eventual consistency vs atomic operations
for any strategy other than 'All SQL All The Time' - that doesn't
necessarily bubble up to users, but we'll certainly have code that
needs to understand the difference.

I am excited by the potential around a distributed datastore, but I
fear that our norm of having every project own its own direction means
we're going to have a very hard time agreeing around such a thing: and
I very much don't want just one project running on a distributed
datastore - and I also don't want us to write an abstraction over all
possible datastores!

It's axiomatic that other orgs have had great success with distributed
datastores, but that doesn't mean its going to be a great fit and work
well for OpenStack. So I think we need to bias our discussion towards
assessing that fit.

What about this: we try to make enough hallway time this summit to
figure out the functional 'must be this high' criteria that we as a
community would need to have adopting a distributed database be a net
benefit. Everyone may have different concerns here - but if we union
them together we can see if there is a feasible thing.

For instance, the things I think are essential for a distributed
database based datastore:
 - good single-machine developer story. Must not need a physical
cluster to hack on OpenStack
 - deal gracefully with single node/rack/site failures (when deployed
appropriately) - allow limiting failure domain impact
 - straightforward programming model: wrong uses should be obvious to reviewers
 - low latency performance with big datasets: e.g. nova list as an
admin should be able to get the Nth page as rapidly as the 2nd or 3rd.
 - code to deliver that should be (approximately) no worse than the current code

I realise this isn't easy to judge from a small prototype - so I'd be
proposing that once we've got rough consensus on a set of essential
things, we constitute a new team to put together a sizable POC (or
POCs - one per DB for a few DBs...) - an 80% thing - so folk can
actually assess how it look/feel at both a code and ops perspective.

Then, if the result *has met* those essential things, we should
actually have the ability to do a gap analysis and talk about how a
project to move to such a thing would look like.


Robert Collins <rbtcollins at hpe.com>
Distinguished Technologist
HP Converged Cloud

More information about the OpenStack-dev mailing list