Open Stack

Wed May 4 18:23:37 UTC 2016

Excerpts from Mark Doffman's message of 2016-05-03 17:05:54 -0700:
> This thread has been a depressing read.
> 

First, I apologize if any of my actions have caused you any undue stress.

> I understand that the content is supposed to be distributed databases 
> but for me it has become an inquisition of cellsV2.
> 

That word, inquisition, is a bit loaded with cultural significance,
though I think the sterile definition applies accurately. It's not my
intend to bring any of the unfortunate aspects of it into this process
though. My main concern is that the actual details haven't even been
thought through at a high level, and we maybe shouldn't be pinning all
our scaling hopes on something that may well end up changing radically
in practice.

> Our question has clearly become "Should we continue efforts on 
> cellsV2?", which I will address head-on.
> 
> We shouldn't be afraid to abandon CellsV2. If there are designs that are 
> proven to be a better solution then our current momentum shouldn't keep 
> us from an abrupt change. As someone who is working on this I have an 
> attachment to the current design, but Its important for me to keep an 
> open mind.
> 
> Here are my *main* reasons for continuing work on CellsV2.
> 
> 1. It provides a proven solution to an immediate message queue problem.
> 
> Yes CellsV2 is different to CellsV1, but the previous solution showed 
> that application-level sharding of the message queue can work. CellsV2 
> provides this solution with a (moderately) easy upgrade path for 
> existing deployments. These deployments may not be comfortable with 
> changing MQ technologies or may already be using CellsV1. Application 
> level sharding of the message queue is not pretty, but will work.
> 

Indeed, one advantage of using a broker for RPC is that you only have
to ensure connectivity from nodes -> brokers. I can totally understand
a hesitance to ask people to ensure connectivity from (class of
nodes)<->(class of nodes), for each class of nodes that need it. That is
what 0mq asks one to do.

I was witness to a brief presentation from one of the QPID community
members about how they've addressed brokerless comms with a very simple,
non-broker "router daemon", and it was impressive how it straddled this
line nicely, allowing one to basically replace a broker with a set of
relatively stupid daemons that simply pass messages along in realtime,
using some clever techniques borrowed from OSPF and the like.

Both of these, 0mq, and brokerless AMQP 1.0, can be taken advantage of
_today_ with oslo.messaging drivers that exist already. However, they
require some battle hardening, so I respect that there are some who'd
rather we change OpenStack around its own battle tested choices than
start experimenting with new solutions that are outside of OpenStack.

The point of my persistence here is to make it clear that I don't think
Cells V2 is settled, and I don't think it will be a generally consumable
solution any time soon. I think for those of us with immediate concerns,
who are not interested in taking on cells v1 at this time, we should
look to experiment with these other options.

> 2. The 'complexity' of CellsV2 is vastly overstated.
> 
> Sure there is a-lot of *work* to do for cellsv2, but this doesn't imply 
> increased complexity: any refactoring requires work. CellsV1 added 
> complexity to our codebase, Cellsv2 does not. In-fact by clearly 
> separating data that is 'owned'by the different services we have I 
> believe that we are improving the modularity and encapsulation present 
> in Nova.
> 

I think the complexity is entirely unknown, and that the design should
fill its gaps, even at high levels, so that we can actually reason about
the complexity. Right now, there's hand waving in places that concern
me.

> 3. CellsV2 does not prohibit *ANY* of the alternative scaling methods
>     mentioned in this thread.
> 
> Really, it doesn't. Both message queue and database switching are 
> completely optional. Both in the sense of running a single cell, and 
> even when running multiple cells. If anything, the ability to run 
> separate message queues and database connections could give us the 
> ability to trial these alternative technologies within a real, running, 
> cloud.
> 
> Just imagine the ability to set up a cell in your existing cloud that 
> runs 0mq rather than rabbit. How about a NewSQL database integrated in 
> to an existing cloud? Both of these things may (With some work) be possible.
> 

Prohibit is definitely not the word I would use either. But I'm not sure
I'd get too excited about enabling multiple drivers across cells either.
What I'd really like is a simple solution, and I truly do hope that
cells v2 becomes that some day.

> 
> 
> I could go on, but I won't. These are my main reasons and I'll stick to 
> them.
> 
> Its difficult to be proven wrong, but sometimes necessary to get the 
> best product that we can. I don't think that the existence of 
> alternative message queue and database options is enough to stop cellsV2 
> work now. A proven solution, that meets the upgrade constraints that we 
> have in Nova, would be a good reason to do so. We should of-course 
> explore other options, nothing we are doing prevents that. When they 
> work out, I'll be super excited.
> 

To be clear, I'm quite convinced we do not need any alternative database
solution once RPC is done differently. I hope we can see actual data
that proves this, though I know from experience, it won't be as soon as
any of us would like.

Open Stack

[openstack-dev] [nova] Distributed Database

OpenStack

Community

Documentation

Branding & Legal