[openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
Dmitry Tantsur
dtantsur at redhat.com
Wed Oct 1 07:05:24 UTC 2014
On 09/30/2014 02:03 PM, Soren Hansen wrote:
> 2014-09-12 1:05 GMT+02:00 Jay Pipes <jaypipes at gmail.com>:
>> If Nova was to take Soren's advice and implement its data-access layer
>> on top of Cassandra or Riak, we would just end up re-inventing SQL
>> Joins in Python-land.
>
> I may very well be wrong(!), but this statement makes it sound like you've
> never used e.g. Riak. Or, if you have, not done so in the way it's
> supposed to be used.
>
> If you embrace an alternative way of storing your data, you wouldn't just
> blindly create a container for each table in your RDBMS.
>
> For example: In Nova's SQL-based datastore we have a table for security
> groups and another for security group rules. Rows in the security group
> rules table have a foreign key referencing the security group to which
> they belong. In a datastore like Riak, you could have a security group
> container where each value contains not just the security group
> information, but also all the security group rules. No joins in
> Python-land necessary.
>
>> I've said it before, and I'll say it again. In Nova at least, the SQL
>> schema is complex because the problem domain is complex. That means
>> lots of relations, lots of JOINs, and that means the best way to query
>> for that data is via an RDBMS.
>
> I was really hoping you could be more specific than "best"/"most
> appropriate" so that we could have a focused discussion.
>
> I don't think relying on a central data store is in any conceivable way
> appropriate for a project like OpenStack. Least of all Nova.
>
> I don't see how we can build a highly available, distributed service on
> top of a centralized data store like MySQL.
Coming from Skype background I can assure your that you definitely can,
depending on your needs (and our experiments with e.g. MongoDB ended
very badly: it just died under IO loads, that our PostgreSQL treated
like normal). I mean, that's complex topic and I see a lot of people
switching to NoSQL and a lot of people switching from. NoSQL is not a
silver bullet for scalability. Just my 0.5.
/me disappears again
>
> Tens or hundreds of thousands of nodes, spread across many, many racks
> and datacentre halls are going to experience connectivity problems[1].
>
> This means that some percentage of your infrastructure (possibly many
> thousands of nodes, affecting many, many thousands of customers) will
> find certain functionality not working on account of your datastore not
> being reachable from the part of the control plane they're attempting to
> use (or possibly only being able to read from it).
>
> I say over and over again that people should own their own uptime.
> Expect things to fail all the time. Do whatever you need to do to ensure
> your service keeps working even when something goes wrong. Of course
> this applies to our customers too. Even if we take the greatest care to
> avoid downtime, customers should spread their workloads across multiple
> availability zones and/or regions and probably even multiple cloud
> providers. Their service towards their users is their responsibility.
>
> However, our service towards our users is our responsibility. We should
> take the greatest care to avoid having internal problems affect our
> users. Building a massively distributed system like Nova on top of a
> centralized data store is practically a guarantee of the opposite.
>
>> For complex control plane software like Nova, though, an RDBMS is the
>> best tool for the job given the current lay of the land in open source
>> data storage solutions matched with Nova's complex query and
>> transactional requirements.
>
> What transactional requirements?
>
>> Folks in these other programs have actually, you know, thought about
>> these kinds of things and had serious discussions about alternatives.
>> It would be nice to have someone acknowledge that instead of snarky
>> comments implying everyone else "has it wrong".
>
> I'm terribly sorry, but repeating over and over that an RDBMS is "the
> best tool" without further qualification than "Nova's data model is
> really complex" reads *exactly* like a snarky comment implying everyone
> else "has it wrong".
>
> [1]: http://aphyr.com/posts/288-the-network-is-reliable
>
More information about the OpenStack-dev
mailing list