Open Stack

Tue Nov 29 20:43:41 UTC 2011

2011/11/29 Jay Pipes <jaypipes at gmail.com>:
> On Tue, Nov 29, 2011 at 2:58 PM, Soren Hansen <soren at linux2go.dk> wrote:
>> 2011/11/29 Jay Pipes <jaypipes at gmail.com>:
>>> There's a very good reason this hasn't happened so far: handling
>>> highly relational datasets with a non-relational data store is a bad
>>> idea. In fact, I seem to remember that is exactly how Nova's data
>>> store started out life (*cough* Redis *cough*)
>> To be fair, we're only barely making use of this in our DB
>> implementation. I don't think we do any foreign key checking at all,
>> and deletes (because we don't actually delete anything, we just mark
>> it as deleted) don't cascade, so there are all sort of ways in which
>> our data store could be inconsistent.
> Because the database schema isn't properly protecting against
> referential integrity failures does not mean the relational database
> store is a failure itself.

I'm not suggesting it's a failure at all.

>> Besides, we don't really use transactions. I could easily read the
>> same data from two separate nodes, make different (irreconcilable)
>> changes on both nodes, and write them back, and the last one to write
>> simply wins.
> Sure, but using a KV store doesn't solve this problem...

I'm not suggesting it will. My point is simply that using a KV store
wouldn't lose us anything in that respect.

>> In short, it seems to me we're not really getting much out of having a
>> relational data store?
> We're getting out of it what we ask of it. We aren't using scoped
> sessions properly, aren't using transactions properly, and we aren't
> enforcing referential integrity. But those are choices we've made, not
> some native deficiency in relational data stores.

I didn't mean to suggest that that was the case at all. The point I'm
trying (but failing, clearly) to make is that with the way we're using
it, we're not reaping the usual benefits from it, and that we'd in
fact not lose anything by using a KV store.

> As soon as someone can demonstrate the performance, scalability, and
> robustness advantages of rewriting the data layer to use a
> non-relational data store, I'm all ears. Until that point, I remain
> unconvinced that the relational database is the source of major
> bottlenecks.

I understand that MySQL (and the other backends supported by
SQLAlchemy, too) scales very well. Vertically. I doubt they'll be
bottlenecks. Heck, they're even well-understood enough that people
have built very decent HA setups using them. I just don't think
they're a particularly good fit for a distributed system. You can have
a highly available datastore all you want, but I'd sleep better
knowing that our data is stored in a distributed system that is
designed to handle network partitions well.

-- 
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

Open Stack

[Openstack] Database stuff

OpenStack

Community

Documentation

Branding & Legal