[openstack-dev] In memory joins in Nova

Clint Byrum clint at fewbar.com
Thu Aug 13 03:28:56 UTC 2015


Excerpts from Mike Bayer's message of 2015-08-13 11:03:32 +0800:
> 
> On 8/12/15 10:29 PM, Clint Byrum wrote:
> > Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:
> >>> If OTOH we are referring to the width of the columns and the join is
> >>> such that you're going to get the same A identity over and over again,
> >>> if you join A and B you get a "wide" row with all of A and B with a very
> >>> large amount of redundant data sent over the wire again and again (note
> >>> that the database drivers available to us in Python always send all rows
> >>> and columns over the wire unconditionally, whether or not we fetch them
> >>> in application code).
> >> Yep, it was this. N instances times M rows of metadata each. If you pull
> >> 100 instances and they each have 30 rows of system metadata, that's a
> >> lot of data, and most of it is the instance being repeated 30 times for
> >> each metadata row. When we first released code doing this, a prominent
> >> host immediately raised the red flag because their DB traffic shot
> >> through the roof.
> >>
> > In the past I've taken a different approach to problematic one to
> > many relationships and have made the metadata a binary JSON blob.
> > Is there some reason that won't work? Of course, this type of thing
> > can run into concurrency issues on update, but these can be handled by
> > SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
> > is nearly always queried as a whole, this seems like a valid approach
> > that would keep DB traffic low but also ease the burden of reassembling
> > the collection in nova-api.
> 
> JSON blobs have the disadvantages that you are piggybacking an entirely 
> different storage model on top of the relational one, losing all the 
> features you might like about the relational model like rich datatypes 
> (I understand our JSON decoders trip up on plain datetimes?), insert 
> defaults, nullability constraints, a fixed, predefined schema that can 
> be altered in a controlled, all-or-nothing way, efficient storage 
> characteristics, and of course reasonable querying capabilities.   They 
> are useful IMO only for small sections of data that are amenable to 
> ad-hoc changes in schema like simple bags of key-value pairs containing 
> miscellaneous features.
> 

Agreed on all points!. And metadata for instances is exactly that:
a simple bag of key/value strings that is almost always queried and
delivered as a whole.



More information about the OpenStack-dev mailing list