[openstack-dev] [Glance] Replacing Glance DB code to Oslo DB code.
mark.washenberger at markwash.net
Tue Aug 20 07:15:39 UTC 2013
So much great stuff to respond to in this snip and response!
On Mon, Aug 19, 2013 at 2:17 AM, Flavio Percoco <flavio at redhat.com> wrote:
> On 19/08/13 02:51 -0400, Jay Pipes wrote:
>> On 08/19/2013 12:56 AM, Joshua Harlow wrote:
>>> Another good article from an ex-coworker that keeps on making more and
>>> more sense the more projects I get into...
>>> Your mileage/opinion though may vary :)
This article looks great--but I think it depends on taking the incredibly
limited / incorrect view of an ORM that has been popularized and that we
currently employ in many OpenStack projects.
In particular, the critical issue is, are you actually using a mapper? Do
you *know* what the mapper pattern is? The key part of the mapper is that
you've got two components, say A and B, that you desperately want to keep
decoupled. Now, A and B need to interact, but they both are very likely to
need to change. And maybe what's worse, they really suck to change together
at the same time. (A great example of A and B is "db schema" and "business
logic".) Since they need to interact, which one is going to depend on the
other? A mapper M solves this problem by depending on both A and B,
allowing the two key modules to continue to evolve independently.
This approach would be amazing for our CD efforts, because it lets you move
one step at a time. In one deploy, you update the schema and mapper, but
keep the application code the same. In the next change, you just change the
application code. And so forth, allowing a solution to the problem of
temporary schema/code/functionality incompatibilities (well for part of the
problem anyway) that happens during a large-scale deployment.
But of course sqlalchemy's declarative models hamstring any such effort
while simultaneously teaching developers entirely the wrong lesson about
mappers! What I mean is that if I'm using sqlalchemy model objects and want
to change a table definition, I have to change a model object and thus much
of my application code. Decoupling went flying out the window. . . sad
>> 2) I highly caution folks who think a No-SQL store is a good storage
>> solution for any of the data currently used by Nova, Glance (registry),
>> Cinder (registry), Ceilometer, and Quantum. All of the data stored and
>> manipulated in those projects is HIGHLY relational data, and not
>> objects/documents. Switching to use a KVS for highly relational data is a
>> terrible decision. You will just end up implementing joins in your code...
> FWIW, I'm a huge fan of NoSQL technologies but I couldn't agree more
I have to say I'm kind of baffled by this sentiment (expressed here and
elsewhere in the thread.) I'm not a NoSQL expert, but I hang out with a few
and I'm pretty confident Glance at least is not that relational. We do two
types of joins in glance. The first, like image properties, is basically
just an implementation detail of the sql driver. Its not core to the
application. Any NoSQL implementation will simply completely denormalize
those properties into the image record. (And honestly, so might an
optimized SQL implementation. . .)
The second type of join, image_members, is basically just a hack to solve
the problem created because the glance api offers several simultaneous
implicit "views" of images. Specifically, when you list images in glance,
you are seeing a union of three views: public images, images you own, and
images shared with you. IMO its actually a more scalable and sensible
solution to make these views more explicit and independent in the API and
code, taking a lesson from filesystems which have to scale to a lot of
metadata (notice how visibility is generally an attribute of a directory,
not of regular files in your typical Unix FS?). And to solve this problem
in SQL now we still have to do a server-side union, which is a bit sad. But
even before we can refactor the API (v3 anyone?) I don't see it as
unworkably slow for a NoSQL driver to track these kinds of views.
The bigger concern to me is that Glance seems a bit trigger-happy with
indexes. But I'm confident we're in a similar boat there: performance in
NoSQL won't be that terrible for the most important use cases, and a later
refactoring can put us on a more sustainable track in the long run.
And then, so I'm not just picking on flaper87. .
>> All I'm saying is that we should be careful not to swap one set of
>> problems for another.
> My 2 cents: I am in agreement with Jay. I am leery of NoSQL being a
> direct sub in and I fear that this effort can be adding a large workload
> for little benefit.
The goal isn't really to replace sqlalchemy completely. I'm hoping I can
create a space where multiple drivers can operate efficiently without
introducing bugs (i.e. pull all that business logic out of the driver!)
I'll be very interested to see if people can, after such a refactoring, try
out some more storage approaches, such as dropping the sqlalchemy orm in
favor of its generic engine support or direct sql execution, as well as
NoSQL what-have-you. We don't have to make all of the drivers live in the
project, so it really can be a good place for interested parties to
> A somewhat related post:
I really think we're more in the "slow careful refactoring a few modules at
a time" regime than the "complete rewrite" regime. But I will also give a
shout out for "Quality" here, too. Folks, its a thing. You can figure a lot
of it out before you write code. Read up on SOLID, GoF Patterns, and for
giant systems, some Udi Dahan. If Joel's history is accurate, then I think
as an industry, we should be trying to avoid Netscape's first mistake as
much or more than their second mistake.
But, I think Joel should write one about Keystone and Keystone Lite as well
:-) And, if we do a Glance Lite, dibs on calling it "Glimpse".
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev