[openstack-dev] [nova] Distributed Database
andrew at lascii.com
Wed May 4 13:54:20 UTC 2016
On Tue, May 3, 2016, at 08:05 PM, Mark Doffman wrote:
> This thread has been a depressing read.
> I understand that the content is supposed to be distributed databases
> but for me it has become an inquisition of cellsV2.
> Our question has clearly become "Should we continue efforts on
> cellsV2?", which I will address head-on.
> We shouldn't be afraid to abandon CellsV2. If there are designs that are
> proven to be a better solution then our current momentum shouldn't keep
> us from an abrupt change. As someone who is working on this I have an
> attachment to the current design, but Its important for me to keep an
> open mind.
> Here are my *main* reasons for continuing work on CellsV2.
> 1. It provides a proven solution to an immediate message queue problem.
> Yes CellsV2 is different to CellsV1, but the previous solution showed
> that application-level sharding of the message queue can work. CellsV2
> provides this solution with a (moderately) easy upgrade path for
> existing deployments. These deployments may not be comfortable with
> changing MQ technologies or may already be using CellsV1. Application
> level sharding of the message queue is not pretty, but will work.
> 2. The 'complexity' of CellsV2 is vastly overstated.
> Sure there is a-lot of *work* to do for cellsv2, but this doesn't imply
> increased complexity: any refactoring requires work. CellsV1 added
> complexity to our codebase, Cellsv2 does not. In-fact by clearly
> separating data that is 'owned'by the different services we have I
> believe that we are improving the modularity and encapsulation present
> in Nova.
> 3. CellsV2 does not prohibit *ANY* of the alternative scaling methods
> mentioned in this thread.
> Really, it doesn't. Both message queue and database switching are
> completely optional. Both in the sense of running a single cell, and
> even when running multiple cells. If anything, the ability to run
> separate message queues and database connections could give us the
> ability to trial these alternative technologies within a real, running,
> Just imagine the ability to set up a cell in your existing cloud that
> runs 0mq rather than rabbit. How about a NewSQL database integrated in
> to an existing cloud? Both of these things may (With some work) be
> I could go on, but I won't. These are my main reasons and I'll stick to
> Its difficult to be proven wrong, but sometimes necessary to get the
> best product that we can. I don't think that the existence of
> alternative message queue and database options is enough to stop cellsV2
> work now. A proven solution, that meets the upgrade constraints that we
> have in Nova, would be a good reason to do so. We should of-course
> explore other options, nothing we are doing prevents that. When they
> work out, I'll be super excited.
Thank you for writing this. You have eloquently described the situation
and I completely agree.
> On 4/29/16 12:53 AM, Clint Byrum wrote:
> > Excerpts from Mike Bayer's message of 2016-04-28 22:16:54 -0500:
> >> On 04/28/2016 08:25 PM, Edward Leafe wrote:
> >>> Your own tests showed that a single RDBMS instance doesn’t even break a sweat
> >>> under your test loads. I don’t see why we need to shard it in the first
> >>> place, especially if in doing so we add another layer of complexity and
> >>> another dependency in order to compensate for that choice. Cells are a useful
> >>> concept, but this proposed implementation is adding way too much complexity
> >>> and debt to make it worthwhile.
> >> now that is a question I have also. Horizontal sharding is usually for
> >> the case where you need to store say, 10B rows, and you'd like to split
> >> it up among different silos. Nothing that I've seen about Nova suggests
> >> this is a system with any large data requirements, or even medium size
> >> data (a few million rows in relational databases is nothing). I
> >> didn't have the impression that this was the rationale behind Cells, it
> >> seems like this is more of some kind of logical separation of some kind
> >> that somehow suits some environments (but I don't know how).
> >> Certainly, if you're proposing a single large namespace of data across a
> >> partition of nonrelational databases, and then the data size itself is
> >> not that large, as long as "a single namespace" is appropriate then
> >> there's no reason to break out of more than one MySQL database. There's
> >> not much reason to transparently shard unless you are concerned about
> >> adding limitless storage capacity. The Cells sharding seems to be
> >> intentionally explicit and non-transparent.
> > There's a bit more to it than the number of rows. There's also a desire
> > to limit failure domains. IMO, that is entirely unfounded, as I've run
> > thousands of servers that depended on a single pair of MySQL servers
> > using simple DRBD and pacemaker with a floating IP for failover. This
> > is the main reason MySQL is a thing... it can handle 100,000 concurrent
> > connections just fine, and the ecosystem around detecting and handling
> > failure/maintenance is mature.
> > The whole cells conversation, IMO, stems from the way we use RabbitMQ.
> > We should just stop doing that. I know as I move forward with our scaling
> > efforts, I'll be trying several RPC drivers and none of them will go
> > through RabbitMQ.
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> OpenStack Development Mailing List (not for usage questions)
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev