[openstack-dev] In memory joins in Nova

Mike Bayer mbayer at redhat.com
Wed Aug 12 18:35:25 UTC 2015



On 8/12/15 1:49 PM, Sachin Manpathak wrote:
> Thanks, This feedback was helpful.
> Perhaps my paraphrasing was misleading. I am not running openstack at 
> scale in order to see how much the DB can sustain. My observation was 
> that the host running nova services saturates on CPU much earlier than 
> the DB does.
You absolutely *want* a single host to be saturated *way* before the 
database is; the database here is a single vertical service intended to 
serve hundreds or thousands of horizontally scaled clients 
simultaneously.    A single request at a time should not even be a blip 
in the database's view of things.



> Joins could be one of the reasons. I also observed that background 
> tasks like instance creation, resource/stats updates contend with get 
> queries. In addition to caching optimizations prioritizing tasks in 
> nova could help.
>
> Is there a nova API to fetch list of instances without metadata? Until 
> I find a good way to profile openstack code, changing the queries can 
> be a good experiement IMO.
>
>
> On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith <dms at danplanet.com 
> <mailto:dms at danplanet.com>> wrote:
>
>     > If OTOH we are referring to the width of the columns and the join is
>     > such that you're going to get the same A identity over and over
>     again,
>     > if you join A and B you get a "wide" row with all of A and B
>     with a very
>     > large amount of redundant data sent over the wire again and
>     again (note
>     > that the database drivers available to us in Python always send
>     all rows
>     > and columns over the wire unconditionally, whether or not we
>     fetch them
>     > in application code).
>
>     Yep, it was this. N instances times M rows of metadata each. If
>     you pull
>     100 instances and they each have 30 rows of system metadata, that's a
>     lot of data, and most of it is the instance being repeated 30
>     times for
>     each metadata row. When we first released code doing this, a prominent
>     host immediately raised the red flag because their DB traffic shot
>     through the roof.
>
>     > In this case you *do* want to do the join in
>     > Python to some extent, though you use the database to deliver the
>     > simplest information possible to work with first; you get the
>     full row
>     > for all of the A entries, then a second query for all of B plus A's
>     > primary key that can be quickly matched to that of A.
>
>     This is what we're doing. Fetch the list of instances that match the
>     filters, then for the ones that were returned, get their metadata.
>
>     --Dan
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150812/f6214908/attachment.html>


More information about the OpenStack-dev mailing list