[openstack-dev] In memory joins in Nova
Mike Bayer
mbayer at redhat.com
Wed Aug 12 18:35:25 UTC 2015
On 8/12/15 1:49 PM, Sachin Manpathak wrote:
> Thanks, This feedback was helpful.
> Perhaps my paraphrasing was misleading. I am not running openstack at
> scale in order to see how much the DB can sustain. My observation was
> that the host running nova services saturates on CPU much earlier than
> the DB does.
You absolutely *want* a single host to be saturated *way* before the
database is; the database here is a single vertical service intended to
serve hundreds or thousands of horizontally scaled clients
simultaneously. A single request at a time should not even be a blip
in the database's view of things.
> Joins could be one of the reasons. I also observed that background
> tasks like instance creation, resource/stats updates contend with get
> queries. In addition to caching optimizations prioritizing tasks in
> nova could help.
>
> Is there a nova API to fetch list of instances without metadata? Until
> I find a good way to profile openstack code, changing the queries can
> be a good experiement IMO.
>
>
> On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith <dms at danplanet.com
> <mailto:dms at danplanet.com>> wrote:
>
> > If OTOH we are referring to the width of the columns and the join is
> > such that you're going to get the same A identity over and over
> again,
> > if you join A and B you get a "wide" row with all of A and B
> with a very
> > large amount of redundant data sent over the wire again and
> again (note
> > that the database drivers available to us in Python always send
> all rows
> > and columns over the wire unconditionally, whether or not we
> fetch them
> > in application code).
>
> Yep, it was this. N instances times M rows of metadata each. If
> you pull
> 100 instances and they each have 30 rows of system metadata, that's a
> lot of data, and most of it is the instance being repeated 30
> times for
> each metadata row. When we first released code doing this, a prominent
> host immediately raised the red flag because their DB traffic shot
> through the roof.
>
> > In this case you *do* want to do the join in
> > Python to some extent, though you use the database to deliver the
> > simplest information possible to work with first; you get the
> full row
> > for all of the A entries, then a second query for all of B plus A's
> > primary key that can be quickly matched to that of A.
>
> This is what we're doing. Fetch the list of instances that match the
> filters, then for the ones that were returned, get their metadata.
>
> --Dan
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150812/f6214908/attachment.html>
More information about the OpenStack-dev
mailing list