[openstack-dev] [nova] Distributed Database

Mike Bayer mbayer at redhat.com
Thu Apr 28 16:57:59 UTC 2016



On 04/28/2016 08:44 AM, Edward Leafe wrote:
> On Apr 24, 2016, at 3:28 PM, Robert Collins <robertc at robertcollins.net> wrote:
>
>> For instance, the things I think are essential for a distributed
>> database based datastore:
>> - good single-machine developer story. Must not need a physical
>> cluster to hack on OpenStack
>> - deal gracefully with single node/rack/site failures (when deployed
>> appropriately) - allow limiting failure domain impact
>> - straightforward programming model: wrong uses should be obvious to reviewers
>> - low latency performance with big datasets: e.g. nova list as an
>> admin should be able to get the Nth page as rapidly as the 2nd or 3rd.
>> - code to deliver that should be (approximately) no worse than the current code
>
> Agree on all of these points, as well as the rest of your post.
>
> After several hallway track discussions, as well as yesterday’s Cells V2 discussion, I’ve written a follow-up post:
>
> http://blog.leafe.com/index.php/2016/04/28/fragmented-data/
>
> Feedback, of course, is welcomed!


Regarding ROME [1], I've taken a look at its source code and while it is 
certainly interesting, I wouldn't recommend lifting and moving all of 
Nova's database infrastructure onto it as a dependency within the near 
term, as the state of this code is very immature.  SQLAlchemy itself was 
once immature as well, so there is no sin here, but that was eleven 
years ago.

The internals here are not only highly dependent on SQLAlchemy internals 
(pinned at the 0.9 series which is obsolete), it is using these APIs in 
a very brittle and non-performant way [2].  In this code example, the 
internal elements of SQLAlchemy expression objects are repeatedly run 
through str() which on each call runs a full string compilation step in 
order to test for what their actual type is.  It can't be overstated how 
inappropriate this approach is and the author of the library would have 
benefited from reaching out to me in order to get some guidance on the 
correct way to introspect SQLAlchemy expression objects.  Basic Python 
idioms like type checking also seem to be misunderstood [3].

I don't think anyone denies that Nova can use any kind of database 
backend but the point was raised that to start from scratch with an 
entirely new database approach is an enormous job.   If the first step 
of that job is in fact "port SQLAlchemy and the relational model to 
Redis", that makes the job extremely more involved and I'd disagree with 
your post's assertion that "It's not too late" if this is the case. 
If the admission of ROME for Nova is that the relational model is in 
fact necessary for Nova, then that disqualifies NoSQL databases out of 
the gate - it's one thing to lament that MySQL is not as "distributed" 
out of the box as a NoSQL database, but it's another to lament that 
non-relational databases are not in fact relational.

[1] https://github.com/BeyondTheClouds/rome

[2] 
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172

[3] 
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102

>
>
> -- Ed Leafe
>
>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list