[openstack-dev] In memory joins in Nova

Clint Byrum clint at fewbar.com
Thu Aug 13 02:29:57 UTC 2015


Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:
> > If OTOH we are referring to the width of the columns and the join is
> > such that you're going to get the same A identity over and over again, 
> > if you join A and B you get a "wide" row with all of A and B with a very
> > large amount of redundant data sent over the wire again and again (note
> > that the database drivers available to us in Python always send all rows
> > and columns over the wire unconditionally, whether or not we fetch them
> > in application code).
> 
> Yep, it was this. N instances times M rows of metadata each. If you pull
> 100 instances and they each have 30 rows of system metadata, that's a
> lot of data, and most of it is the instance being repeated 30 times for
> each metadata row. When we first released code doing this, a prominent
> host immediately raised the red flag because their DB traffic shot
> through the roof.
> 

In the past I've taken a different approach to problematic one to
many relationships and have made the metadata a binary JSON blob.
Is there some reason that won't work? Of course, this type of thing
can run into concurrency issues on update, but these can be handled by
SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
is nearly always queried as a whole, this seems like a valid approach
that would keep DB traffic low but also ease the burden of reassembling
the collection in nova-api.



More information about the OpenStack-dev mailing list