[openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Jay Pipes jaypipes at gmail.com
Tue Sep 30 16:41:29 UTC 2014


On 09/30/2014 08:03 AM, Soren Hansen wrote:
> 2014-09-12 1:05 GMT+02:00 Jay Pipes <jaypipes at gmail.com>:
>> If Nova was to take Soren's advice and implement its data-access layer
>> on top of Cassandra or Riak, we would just end up re-inventing SQL
>> Joins in Python-land.
>
> I may very well be wrong(!), but this statement makes it sound like you've
> never used e.g. Riak. Or, if you have, not done so in the way it's
> supposed to be used.
>
> If you embrace an alternative way of storing your data, you wouldn't just
> blindly create a container for each table in your RDBMS.
>
> For example: In Nova's SQL-based datastore we have a table for security
> groups and another for security group rules. Rows in the security group
> rules table have a foreign key referencing the security group to which
> they belong. In a datastore like Riak, you could have a security group
> container where each value contains not just the security group
> information, but also all the security group rules. No joins in
> Python-land necessary.

OK, that's all fine for a simple one-to-many relation.

How would I go about getting the associated fixed IPs for a network? The 
query to get associated fixed IPs for a network [1] in Nova looks like this:

SELECT
  fip.address,
  fip.instance_uuid,
  fip.network_id,
  fip.virtual_interface_id,
  vif.address,
  i.hostname,
  i.updated_at,
  i.created_at,
  fip.allocated,
  fip.leased,
  vif2.id
FROM fixed_ips fip
LEFT JOIN virtual_interfaces vif
  ON vif.id = fip.virtual_interface_id
  AND vif.deleted = 0
LEFT JOIN instances i
  ON fip.instance_uuid = i.uuid
  AND i.deleted = 0
LEFT JOIN (
  SELECT MIN(vi.id) AS id, vi.instance_uuid
  FROM virtual_interfaces vi
  GROUP BY instance_uuid
) as vif2
WHERE fip.deleted = 0
AND fip.network_id = :network_id
AND fip.virtual_interface_id IS NOT NULL
AND fip.instance_uuid IS NOT NULL
AND i.host = :host

would I have a Riak container for virtual_interfaces that would also 
have instance information, network information, fixed_ip information? 
How would I accomplish the query against a derived table that gets the 
minimum virtual interface ID for each instance UUID?

More than likely, I would end up having to put a bunch of indexes and 
relations into my Riak containers and structures just so I could do 
queries like the above. Failing that, I'd need to do multiple queries to 
multiple Riak containers and then join the resulting projection in 
memory, in Python. And that is why I say you will just end up 
implementing joins in Python.

A relational database was built for the above types of queries, and 
that's why I said it's the best tool for the job *in this specific case*.

Now... that said...

Is it possible to go through the Nova schema and identify mini-schemas 
that could be pulled out of the RDBMS and placed into Riak or Cassandra? 
Absolutely yes! The service group and compute node usage records are 
good candidates for that, in my opinion. With the nova.objects work that 
was completed over the last few cycles, we might actually now have the 
foundation in place to make doing this a reality. I welcome your 
contributions in this area.

[1] 
https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L2608

>> I've said it before, and I'll say it again. In Nova at least, the SQL
>> schema is complex because the problem domain is complex. That means
>> lots of relations, lots of JOINs, and that means the best way to query
>> for that data is via an RDBMS.
>
> I was really hoping you could be more specific than "best"/"most
> appropriate" so that we could have a focused discussion.
>
> I don't think relying on a central data store is in any conceivable way
> appropriate for a project like OpenStack. Least of all Nova.
>
> I don't see how we can build a highly available, distributed service on
> top of a centralized data store like MySQL.
>
> Tens or hundreds of thousands of nodes, spread across many, many racks
> and datacentre halls are going to experience connectivity problems[1].
>
> This means that some percentage of your infrastructure (possibly many
> thousands of nodes, affecting many, many thousands of customers) will
> find certain functionality not working on account of your datastore not
> being reachable from the part of the control plane they're attempting to
> use (or possibly only being able to read from it).
>
> I say over and over again that people should own their own uptime.
> Expect things to fail all the time. Do whatever you need to do to ensure
> your service keeps working even when something goes wrong. Of course
> this applies to our customers too. Even if we take the greatest care to
> avoid downtime, customers should spread their workloads across multiple
> availability zones and/or regions and probably even multiple cloud
> providers. Their service towards their users is their responsibility.
>
> However, our service towards our users is our responsibility. We should
> take the greatest care to avoid having internal problems affect our
> users.  Building a massively distributed system like Nova on top of a
> centralized data store is practically a guarantee of the opposite.

I don't disagree with anything you say above. At all. I welcome the 
coming cycles where we will get to split pieces out of Nova (which will 
afford us the opportunity to decouple certain mini-schemas from the 
RDBMS and use more appropriate distributed data stores like Cassandra or 
Riak for those smaller schemas).

>> For complex control plane software like Nova, though, an RDBMS is the
>> best tool for the job given the current lay of the land in open source
>> data storage solutions matched with Nova's complex query and
>> transactional requirements.
>
> What transactional requirements?

https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L1654

When you delete an instance, you don't want the delete to just stop 
half-way through the transaction and leave around a bunch of orphaned 
children. Similarly, when you reserve something, it helps to not have a 
half-finished state change that you need to go clean up if something 
goes boom.

https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L3054

>> Folks in these other programs have actually, you know, thought about
>> these kinds of things and had serious discussions about alternatives.
>> It would be nice to have someone acknowledge that instead of snarky
>> comments implying everyone else "has it wrong".
>
> I'm terribly sorry, but repeating over and over that an RDBMS is "the
> best tool" without further qualification than "Nova's data model is
> really complex" reads *exactly* like a snarky comment implying everyone
> else "has it wrong".

Sorry if I sound snarky. I thought your blog post was the definition of 
snark.

Best,
-jay

> [1]: http://aphyr.com/posts/288-the-network-is-reliable
>



More information about the OpenStack-dev mailing list