[openstack-dev] [nova] Distributed Database

Mike Bayer mbayer at redhat.com
Mon May 2 15:51:58 UTC 2016



On 05/02/2016 07:38 AM, Matthieu Simonin wrote:
>
>
> As far as we understand the idea of an ORM is to hide the relational database with an Object oriented API.

I actually disagree with that completely.  The reason ORMs are so 
maligned is because of this misconception; developer attempts to use an 
ORM so that they will need not have to have any awareness of their 
database, how queries are constructed, or even its schema's design; 
witness tools such as Django ORM and Rails ActiveRecord which promise 
this.   You then end up with an inefficient and unextensible mess 
because the developers never considered anything about how the database 
works or how it is queried, nor do they even have easy ways to monitor 
or control it while still making use of the tool.   There are many blog 
posts and articles that discuss this and it is in general known as the 
"object relational impedance mismatch".

SQLAlchemy's success comes from its rejection of this entire philosophy. 
  The purpose of SQLAlchemy's ORM is not to "hide" anything but rather 
to apply automation to the many aspects of relational database 
communication as well as row->object mapping that otherwise express 
themselves in an application as either a large amount of repetitive 
boilerplate throughout an application or as an awkward series of ad-hoc 
abstractions that don't really do the job very well.   SQLAlchemy is 
designed to expose both the schema design as well as the structure of 
queries completely.   My talk at [1] goes into this topic in detail 
including specific API architectures that facilitate this concept.

It's for that reason that I've always rejected notions of attempting to 
apply SQLAlchemy directly on top of a datastore that is explicitly 
non-relational.   By doing so, you remove a vast portion of the 
functionality that relational databases provide and there's really no 
point in using a tool like SQLAlchemy that is very explicit about DDL 
and SQL on top of that kind of database.

To effectively put SQLAlchemy on top of a non-relational datastore, what 
you really want to do is build an entire SQL engine on top of it.  This 
is actually feasible; I was doing work for the now-defunct FoundationDB 
(was bought by Apple) who had a very good implementation of 
SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB 
projects you mention are definitely the most appropriate choice to take 
if a certain variety of distribution underneath SQL is desired.

  Concerning SQLAlchemy,
> relationnal aspect of the underlying database may also be used by the user but we observed that in Nova, most
> of the db interactions are written in an Object-oriented style (few queries are using SQL),
> thus we don't think that Nova requires a relational database, it just requires an object oriented abstraction to manipulate a database.

Well IMO that's actually often a problem.  My goal across Openstack 
projects in general is to allow them to make use of SQL more effectively 
than they do right now; for example, in Neutron I am helping them to 
move a block of code that inefficiently needs to load a block of data 
into memory, scan it for CIDR overlaps, and then push data back out. 
This approach prevents it from performing a single UPDATE statement and 
ushers in the need for pessimistic locking against concurrent 
transactions.  Instead, I've written for them a simple stored function 
proof-of-concept [2] that will allow the entire operation to be 
performed on the database side alone in a single statement.  Wins like 
these are much less feasible if not impossible when a project decides it 
wants to split its backend store between dramatically different 
databases which don't offer such features.

>
> Concretely, we think that there are three possible approaches:
>      1) We can use the SQLAlchemy API as the common denominator between a relational and non-relational implementation of the db.api component. These two implementation could continue to converge by sharing a large amount of code.
>      2) We create a new non-relational implementation (from scratch) of the db.api component. It would require probably more work.
>      3) We are also studying a last alternative: writing a SQLAlchemy engine that targets NewSQL databases (scalability + ACID):
>       - https://github.com/cockroachdb/cockroach
>       - https://github.com/pingcap/tidb

Going with a NewSQL backend is by far the best approach here.   That 
way, very little needs to be reinvented and the application's approach 
to data doesn't need to dramatically change.

But also, w.r.t. Cells there seems to be some remaining debate over why 
exactly a distributed approach is even needed.  As others have posted, a 
single MySQL database, replicated across Galera or not, scales just fine 
for far more data than Nova ever needs to store.  So it's not clear why 
the need for a dramatic rewrite of its datastore is called for.


[1] 
http://www.sqlalchemy.org/library.html#handcodedapplicationswithsqlalchemy

[2] https://gist.github.com/zzzeek/a3bccad40610b9b69803531cc71a79b1


> Matthieu Simonin
> for the discovery project
> https://beyondtheclouds.github.io/
>
>>
>> [1] https://github.com/BeyondTheClouds/rome
>>
>> [2]
>> https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172
>>
>> [3]
>> https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102
>>
>>>
>>>
>>> -- Ed Leafe
>>>
>>>
>>>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list