[openstack-dev] [nova] Distributed Database

Andrew Laski andrew at lascii.com
Wed May 4 13:54:20 UTC 2016



On Tue, May 3, 2016, at 08:05 PM, Mark Doffman wrote:
> This thread has been a depressing read.
> 
> I understand that the content is supposed to be distributed databases 
> but for me it has become an inquisition of cellsV2.
> 
> Our question has clearly become "Should we continue efforts on 
> cellsV2?", which I will address head-on.
> 
> We shouldn't be afraid to abandon CellsV2. If there are designs that are 
> proven to be a better solution then our current momentum shouldn't keep 
> us from an abrupt change. As someone who is working on this I have an 
> attachment to the current design, but Its important for me to keep an 
> open mind.
> 
> Here are my *main* reasons for continuing work on CellsV2.
> 
> 1. It provides a proven solution to an immediate message queue problem.
> 
> Yes CellsV2 is different to CellsV1, but the previous solution showed 
> that application-level sharding of the message queue can work. CellsV2 
> provides this solution with a (moderately) easy upgrade path for 
> existing deployments. These deployments may not be comfortable with 
> changing MQ technologies or may already be using CellsV1. Application 
> level sharding of the message queue is not pretty, but will work.
> 
> 2. The 'complexity' of CellsV2 is vastly overstated.
> 
> Sure there is a-lot of *work* to do for cellsv2, but this doesn't imply 
> increased complexity: any refactoring requires work. CellsV1 added 
> complexity to our codebase, Cellsv2 does not. In-fact by clearly 
> separating data that is 'owned'by the different services we have I 
> believe that we are improving the modularity and encapsulation present 
> in Nova.
> 
> 3. CellsV2 does not prohibit *ANY* of the alternative scaling methods
>     mentioned in this thread.
> 
> Really, it doesn't. Both message queue and database switching are 
> completely optional. Both in the sense of running a single cell, and 
> even when running multiple cells. If anything, the ability to run 
> separate message queues and database connections could give us the 
> ability to trial these alternative technologies within a real, running, 
> cloud.
> 
> Just imagine the ability to set up a cell in your existing cloud that 
> runs 0mq rather than rabbit. How about a NewSQL database integrated in 
> to an existing cloud? Both of these things may (With some work) be
> possible.
> 
> 
> 
> I could go on, but I won't. These are my main reasons and I'll stick to 
> them.
> 
> Its difficult to be proven wrong, but sometimes necessary to get the 
> best product that we can. I don't think that the existence of 
> alternative message queue and database options is enough to stop cellsV2 
> work now. A proven solution, that meets the upgrade constraints that we 
> have in Nova, would be a good reason to do so. We should of-course 
> explore other options, nothing we are doing prevents that. When they 
> work out, I'll be super excited.

Thank you for writing this. You have eloquently described the situation
and I completely agree.


> 
> Thanks
> 
> Mark
> 
> On 4/29/16 12:53 AM, Clint Byrum wrote:
> > Excerpts from Mike Bayer's message of 2016-04-28 22:16:54 -0500:
> >>
> >> On 04/28/2016 08:25 PM, Edward Leafe wrote:
> >>
> >>> Your own tests showed that a single RDBMS instance doesn’t even break a sweat
> >>> under your test loads. I don’t see why we need to shard it in the first
> >>> place, especially if in doing so we add another layer of complexity and
> >>> another dependency in order to compensate for that choice. Cells are a useful
> >>> concept, but this proposed implementation is adding way too much complexity
> >>> and debt to make it worthwhile.
> >>
> >> now that is a question I have also.  Horizontal sharding is usually for
> >> the case where you need to store say, 10B rows, and you'd like to split
> >> it up among different silos.  Nothing that I've seen about Nova suggests
> >> this is a system with any large data requirements, or even medium size
> >> data (a few million rows in relational databases is nothing).    I
> >> didn't have the impression that this was the rationale behind Cells, it
> >> seems like this is more of some kind of logical separation of some kind
> >> that somehow suits some environments (but I don't know how).
> >> Certainly, if you're proposing a single large namespace of data across a
> >> partition of nonrelational databases, and then the data size itself is
> >> not that large, as long as "a single namespace" is appropriate then
> >> there's no reason to break out of more than one MySQL database.  There's
> >> not much reason to transparently shard unless you are concerned about
> >> adding limitless storage capacity.   The Cells sharding seems to be
> >> intentionally explicit and non-transparent.
> >>
> >
> > There's a bit more to it than the number of rows. There's also a desire
> > to limit failure domains. IMO, that is entirely unfounded, as I've run
> > thousands of servers that depended on a single pair of MySQL servers
> > using simple DRBD and pacemaker with a floating IP for failover. This
> > is the main reason MySQL is a thing... it can handle 100,000 concurrent
> > connections just fine, and the ecosystem around detecting and handling
> > failure/maintenance is mature.
> >
> > The whole cells conversation, IMO, stems from the way we use RabbitMQ.
> > We should just stop doing that. I know as I move forward with our scaling
> > efforts, I'll be trying several RPC drivers and none of them will go
> > through RabbitMQ.
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list