Open Stack

Fri Oct 3 07:52:49 UTC 2014

2014-10-03 9:00 GMT+02:00 Michael Chapman <woppin at gmail.com>:
> On Fri, Oct 3, 2014 at 4:05 AM, Soren Hansen <soren at linux2go.dk>
> wrote:
>> That said, there will certainly be situations where there'll be a
>> need for some sort of anti-entropy mechanism. It just so happens that
>> those situations already exist. We're dealing with about a complex
>> distributed system.  We're kidding ourselves if we think that any
>> kind of consistency is guaranteed, just because our data store
>> favours consistency over availability.
> I apologize if I'm missing something, but doesn't denormalization to
> add join support put the same value in many places, such that an
> update to that value is no longer a single atomic transaction?

Yes.

> This would appear to counteract the requirement for strong
> consistency.

What requirement for strong consistency?

> If updating a single value is atomic (as in Riak's consistent mode)

Admittedly, I'm not 100% up-to-date on Riak, but last I looked, there
wasn't any "consistent mode". However, when writing a value, you can
specify that you want all (or a quorum of) replicas to be written to
disk before you get a succesful response. However, this does not imply
transactional support. In other words, if one of the writes fail, it
doesn't get rolled back on the other nodes. You just don't get a
succesful response.

> I also don't really see how a NoSQL system in strong consistency mode
> is any different from running MySQL with galera in its failure modes.

I agree. I never meant to imply that we should run anything in "strong
consistency mode". There might be a few operations that require strong
consistency, but they should be exceptional. Quotas sound like a good
example.

> The requirement for quorum makes the addition of nodes increase the
> potential latency of writes (and reads in some cases) so having large
> scale doesn't grant much benefit, if any.

I agree about the requirement for quorum having those effects (also for
e.g. Galera).  I think you are missing my point, though. My concerns are
not whether MySQL can handle the data volume of a large scale OpenStack
deployment.  I'm sure it can. Without even breaking a sweat. MySQL has
been used in countless deployments to handle data sets vastly bigger
than what we're dealing with.

My concern is reliability.

> Quorum will also prevent nodes on the wrong side of a partition from
> being able to access system state (or it will give them stale state,
> which is probably just as bad in our case).

This problem exists today.  Suppose you have a 5 node Galera cluster.
Would you refuse reads on the wrong side of the partition to avoid
providing stale data?

With e.g. Riak it's perfectly possible to accept both reads and writes
on both sides of the partition.

No matter what we do, we need accept the fact that when we handle the
data, it is by definition out of date. It can have changed the
millisecond after we read it from there and started using it.

> I think your goal of having state management that's able to handle
> network partitions is a good one, but I don't think the solution is as
> simple as swapping out where the state is stored.

It kinda is, and it kinda isn't. I never meant to suggest that just
replacing the datastore would solve everything. We need to carefully
look at our use of the data from the datastore and consider the impact
of eventual consistency on this use. On the other hand, as I just
mentioned above, this is a problem that exists right now, today. We're
just ignoring it, because we happen to have a consistent datastore.

> Maybe in some cases like split-racks the system needs to react to a
> network partition by forming its own independent cell with its own
> state storage, and when the network heals it then merges back into the
> other cluster cleanly?  That would be very difficult to implement, but
> fun (for some definition of fun).

Fun, but possible. Riak was designed for this. With an RDBMS I don't
even know how to begin solving something like that.

> As a thought experiment, a while ago I considered what would happen if
> instead of using a central store, I put a sqlite database behind every
> daemon and allowed them to query each other for the data they needed,
> and cluster if needed (using raft).

> Services like nova-scheduler need strong consistency

No, it doesn't. :)

> and would have to cluster to perform their role, but services like
> nova-compute would simply need to store the data concerning the
> resources they are responsible for. This follows the 'place state at
> the edge' kind of design principles that have been discussed in
> various circles.  It falls down in a number of pretty obvious ways,
> and ultimately it would require more work than I am able to put in,
> but I mention it because perhaps it provides you with food for
> thought.

Yeah, a million distributed consistent databases do not a single
distributed, eventually consistent database make :)

-- 
Soren Hansen             | http://linux2go.dk/
Ubuntu Developer         | http://www.ubuntu.com/
OpenStack Developer      | http://www.openstack.org/

Open Stack

[openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

OpenStack

Community

Documentation

Branding & Legal