[openstack-dev] In memory joins in Nova
Mike Bayer
mbayer at redhat.com
Thu Aug 13 03:03:32 UTC 2015
On 8/12/15 10:29 PM, Clint Byrum wrote:
> Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:
>>> If OTOH we are referring to the width of the columns and the join is
>>> such that you're going to get the same A identity over and over again,
>>> if you join A and B you get a "wide" row with all of A and B with a very
>>> large amount of redundant data sent over the wire again and again (note
>>> that the database drivers available to us in Python always send all rows
>>> and columns over the wire unconditionally, whether or not we fetch them
>>> in application code).
>> Yep, it was this. N instances times M rows of metadata each. If you pull
>> 100 instances and they each have 30 rows of system metadata, that's a
>> lot of data, and most of it is the instance being repeated 30 times for
>> each metadata row. When we first released code doing this, a prominent
>> host immediately raised the red flag because their DB traffic shot
>> through the roof.
>>
> In the past I've taken a different approach to problematic one to
> many relationships and have made the metadata a binary JSON blob.
> Is there some reason that won't work? Of course, this type of thing
> can run into concurrency issues on update, but these can be handled by
> SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
> is nearly always queried as a whole, this seems like a valid approach
> that would keep DB traffic low but also ease the burden of reassembling
> the collection in nova-api.
JSON blobs have the disadvantages that you are piggybacking an entirely
different storage model on top of the relational one, losing all the
features you might like about the relational model like rich datatypes
(I understand our JSON decoders trip up on plain datetimes?), insert
defaults, nullability constraints, a fixed, predefined schema that can
be altered in a controlled, all-or-nothing way, efficient storage
characteristics, and of course reasonable querying capabilities. They
are useful IMO only for small sections of data that are amenable to
ad-hoc changes in schema like simple bags of key-value pairs containing
miscellaneous features.
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list