[openstack-dev] [nova] Distributed Database
mbayer at redhat.com
Mon May 2 15:51:58 UTC 2016
On 05/02/2016 07:38 AM, Matthieu Simonin wrote:
> As far as we understand the idea of an ORM is to hide the relational database with an Object oriented API.
I actually disagree with that completely. The reason ORMs are so
maligned is because of this misconception; developer attempts to use an
ORM so that they will need not have to have any awareness of their
database, how queries are constructed, or even its schema's design;
witness tools such as Django ORM and Rails ActiveRecord which promise
this. You then end up with an inefficient and unextensible mess
because the developers never considered anything about how the database
works or how it is queried, nor do they even have easy ways to monitor
or control it while still making use of the tool. There are many blog
posts and articles that discuss this and it is in general known as the
"object relational impedance mismatch".
SQLAlchemy's success comes from its rejection of this entire philosophy.
The purpose of SQLAlchemy's ORM is not to "hide" anything but rather
to apply automation to the many aspects of relational database
communication as well as row->object mapping that otherwise express
themselves in an application as either a large amount of repetitive
boilerplate throughout an application or as an awkward series of ad-hoc
abstractions that don't really do the job very well. SQLAlchemy is
designed to expose both the schema design as well as the structure of
queries completely. My talk at  goes into this topic in detail
including specific API architectures that facilitate this concept.
It's for that reason that I've always rejected notions of attempting to
apply SQLAlchemy directly on top of a datastore that is explicitly
non-relational. By doing so, you remove a vast portion of the
functionality that relational databases provide and there's really no
point in using a tool like SQLAlchemy that is very explicit about DDL
and SQL on top of that kind of database.
To effectively put SQLAlchemy on top of a non-relational datastore, what
you really want to do is build an entire SQL engine on top of it. This
is actually feasible; I was doing work for the now-defunct FoundationDB
(was bought by Apple) who had a very good implementation of
SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB
projects you mention are definitely the most appropriate choice to take
if a certain variety of distribution underneath SQL is desired.
> relationnal aspect of the underlying database may also be used by the user but we observed that in Nova, most
> of the db interactions are written in an Object-oriented style (few queries are using SQL),
> thus we don't think that Nova requires a relational database, it just requires an object oriented abstraction to manipulate a database.
Well IMO that's actually often a problem. My goal across Openstack
projects in general is to allow them to make use of SQL more effectively
than they do right now; for example, in Neutron I am helping them to
move a block of code that inefficiently needs to load a block of data
into memory, scan it for CIDR overlaps, and then push data back out.
This approach prevents it from performing a single UPDATE statement and
ushers in the need for pessimistic locking against concurrent
transactions. Instead, I've written for them a simple stored function
proof-of-concept  that will allow the entire operation to be
performed on the database side alone in a single statement. Wins like
these are much less feasible if not impossible when a project decides it
wants to split its backend store between dramatically different
databases which don't offer such features.
> Concretely, we think that there are three possible approaches:
> 1) We can use the SQLAlchemy API as the common denominator between a relational and non-relational implementation of the db.api component. These two implementation could continue to converge by sharing a large amount of code.
> 2) We create a new non-relational implementation (from scratch) of the db.api component. It would require probably more work.
> 3) We are also studying a last alternative: writing a SQLAlchemy engine that targets NewSQL databases (scalability + ACID):
> - https://github.com/cockroachdb/cockroach
> - https://github.com/pingcap/tidb
Going with a NewSQL backend is by far the best approach here. That
way, very little needs to be reinvented and the application's approach
to data doesn't need to dramatically change.
But also, w.r.t. Cells there seems to be some remaining debate over why
exactly a distributed approach is even needed. As others have posted, a
single MySQL database, replicated across Galera or not, scales just fine
for far more data than Nova ever needs to store. So it's not clear why
the need for a dramatic rewrite of its datastore is called for.
> Matthieu Simonin
> for the discovery project
>>  https://github.com/BeyondTheClouds/rome
>>> -- Ed Leafe
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev