Open Stack

Wed Aug 12 15:12:23 UTC 2015

> If OTOH we are referring to the width of the columns and the join is
> such that you're going to get the same A identity over and over again, 
> if you join A and B you get a "wide" row with all of A and B with a very
> large amount of redundant data sent over the wire again and again (note
> that the database drivers available to us in Python always send all rows
> and columns over the wire unconditionally, whether or not we fetch them
> in application code).

Yep, it was this. N instances times M rows of metadata each. If you pull
100 instances and they each have 30 rows of system metadata, that's a
lot of data, and most of it is the instance being repeated 30 times for
each metadata row. When we first released code doing this, a prominent
host immediately raised the red flag because their DB traffic shot
through the roof.

> In this case you *do* want to do the join in
> Python to some extent, though you use the database to deliver the
> simplest information possible to work with first; you get the full row
> for all of the A entries, then a second query for all of B plus A's
> primary key that can be quickly matched to that of A.

This is what we're doing. Fetch the list of instances that match the
filters, then for the ones that were returned, get their metadata.

--Dan

Open Stack

[openstack-dev] In memory joins in Nova

OpenStack

Community

Documentation

Branding & Legal