[openstack-dev] [nova] Distributed Database

Clint Byrum clint at fewbar.com
Fri Apr 29 05:53:07 UTC 2016

Excerpts from Mike Bayer's message of 2016-04-28 22:16:54 -0500:
> On 04/28/2016 08:25 PM, Edward Leafe wrote:
> > Your own tests showed that a single RDBMS instance doesn’t even break a sweat
> > under your test loads. I don’t see why we need to shard it in the first
> > place, especially if in doing so we add another layer of complexity and
> > another dependency in order to compensate for that choice. Cells are a useful
> > concept, but this proposed implementation is adding way too much complexity
> > and debt to make it worthwhile.
> now that is a question I have also.  Horizontal sharding is usually for 
> the case where you need to store say, 10B rows, and you'd like to split 
> it up among different silos.  Nothing that I've seen about Nova suggests 
> this is a system with any large data requirements, or even medium size 
> data (a few million rows in relational databases is nothing).    I 
> didn't have the impression that this was the rationale behind Cells, it 
> seems like this is more of some kind of logical separation of some kind 
> that somehow suits some environments (but I don't know how). 
> Certainly, if you're proposing a single large namespace of data across a 
> partition of nonrelational databases, and then the data size itself is 
> not that large, as long as "a single namespace" is appropriate then 
> there's no reason to break out of more than one MySQL database.  There's 
> not much reason to transparently shard unless you are concerned about 
> adding limitless storage capacity.   The Cells sharding seems to be 
> intentionally explicit and non-transparent.

There's a bit more to it than the number of rows. There's also a desire
to limit failure domains. IMO, that is entirely unfounded, as I've run
thousands of servers that depended on a single pair of MySQL servers
using simple DRBD and pacemaker with a floating IP for failover. This
is the main reason MySQL is a thing... it can handle 100,000 concurrent
connections just fine, and the ecosystem around detecting and handling
failure/maintenance is mature.

The whole cells conversation, IMO, stems from the way we use RabbitMQ.
We should just stop doing that. I know as I move forward with our scaling
efforts, I'll be trying several RPC drivers and none of them will go
through RabbitMQ.

More information about the OpenStack-dev mailing list