[openstack-dev] [nova] Distributed Database

Clint Byrum clint at fewbar.com
Mon Apr 25 07:10:07 UTC 2016


Excerpts from Andrew Laski's message of 2016-04-22 14:32:59 -0700:
> 
> On Fri, Apr 22, 2016, at 04:27 PM, Ed Leafe wrote:
> > OK, so I know that Friday afternoons are usually the worst times to
> > write a blog post and start an email discussion, and that the Friday
> > immediately before a Summit is the absolute worst, but I did it anyway.
> > 
> > http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/
> > 
> > Summary: we are creating way too much complexity by trying to make Nova
> > handle things that are best handled by a distributed database. The
> > recent split of the Nova DB into an API database and separate cell
> > databases is the glaring example of going down the wrong road.
> > 
> > Anyway, read it on your flight (or, in my case, drive) to Austin, and
> > feel free to pull me aside to explain just how wrong I am. ;-)
> 
> I agree with a lot of what Monty wrote in his response. And agree that
> given a greenfield there are much better approaches that could be taken
> rather than partitioning the database.
> 
> However I do want to point out that cells v2 is not just about dealing
> with scale in the database. The message queue is another consideration,
> and as far as I know there is not an analog to the "distributed
> database" option available for the persistence layer.
> 

It's not even scale, it is failure domain isolation. I'm pretty
confident I can back 1000 busy compute nodes with a single 32 core 128GB
RabbitMQ. But, to do so is basically pure madness because of the failover
costs. Having 10x8 core RabbitMQ servers, as Cells v2 wants to do, means
the disruption caused by disrupting any one of them should be able to be
contained to 1/10th of the instances running. However, that assumes the
complexity of the implementation won't leak out to the unaffected servers.

Anyway, for messaging, part of the problem is until somewhat recently,
we thought RPC and Notifications were the same thing. They're VASTLY
different. For things like notifications, you don't need to look beyond
Apache Kafka to see that scale-out solutions exist. Also, if you
actually separate these two, you'll find that a single tiny RabbitMQ
cluster can handle the notifications without breaking a tiny sweat,
because it uses RabbitMQ for what it was actually designed for (Lots of
messages, few topics).

RPC being a different animal, we're, frankly, abusing RabbitMQ in silly
ways. There are a _massive_ pile of simpler things just waiting to
be tried:

- 0MQ - There's this fear of change and a bit of chicken/egg preventing
  this from becoming the default choice for RPC any time soon. I for one
  want to look into it, but keep getting side tracked because RMQ is
  "good enough" for now, and the default.
  
- Direct HTTP for RPC - I've always wondered why we don't do this for
  RPC. Same basic idea as 0MQ, but even more familiar to all of us.
  
- Thrift

- gRPC/protobuf

The basic theme for RPC is simple: just send the messages to the
services directly.

> Additionally with v1 we found that deployers have enjoyed being able to
> group their hardware with cells. Baremetal goes in this cell, SSD filled
> computes over here, and spinning disks over there. And beyond that
> there's the ability to create a cell, fill it with hardware, and then
> test it without plugging it up to the production API. Cells provides an
> entry point for poking at things that isn't available without it.
> 
> I don't want to get too sidetracked on talking about cells. I just
> wanted to point out that cells v2 did not come to fruition due to a fear
> of distributed databases.
> 

I'd love for the feature described to be separate from scaling. It's a
great feature, but it's just a happy accident that it helps with scale,
and would work just as well if we called it "host aggregates" and actually
made host aggregates work well.



More information about the OpenStack-dev mailing list