<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 8/11/15 7:14 PM, Sachin Manpathak
wrote:<br>
</div>
<blockquote
cite="mid:CADH8PtPGGGhdWDuE++PN7SbzrLTbpMntEfRUskNpGwiU1DaGCw@mail.gmail.com"
type="cite">
<div dir="ltr">I am struggling with python code profiling in
general. It has its own caveats like 100% plus overhead.
<div>However, on a host with only nova services (DB on a
different host), I see cpu utilization spike up quickly with
scale. The DB server is relatively calm and never goes over
20%. On a system which relies on DB to fetch all the data,
this should not happen.</div>
</div>
</blockquote>
The DB's resources are intended to scale up in response to wide
degree of concurrency, that is, lots and lots of API services all
hitting it from many concurrent API calls. "with scale" here is a
slippery term. What kind of concurrency are you testing with ?
How many CPUs serving API calls are utilized simultaneously? To
saturate the database you need many dozens, and even then you don't
want your database CPU going very high. 20% does not seem that low
to me, actually. I disagree with the concept that high database
CPU refers to a performant application, or that DB saturation is a
requirement in order for a database-delivered application to be
performant; I think the opposite is true. In web application
development, when I worked with production sites at high volume, the
goal was to use enough caching so that major site pages being viewed
constantly could be delivered with *no* database access whatsoever.
We wanted to see the majority of the site being sent to customers
with the database at essentially zero; this is how you get page
response times down from 200-300 ms down to 20 or 30. If you
want to measure performance, looking at API response time is
probably better than looking at CPU utilization first.<br>
<br>
That said, Python is a very CPU intensive language, because it is an
interpreted scripting language. Operations that in a language like
compiled C would be hardly a whisper of CPU end up being major
operations in Python. Openstack suffers from a large amount of
function call overhead even for simple API operations, as it is an
extremely layered system with very little use of caching. Until it
moves to a JIT-based interpreter like Pypy that can flatten out
call-chains, the amount of overhead just for an API call to come in
and go back out with a response will remain significant. As for
caching, making use of a technique such as memcached caching of data
structures can also greatly improve performance because we can cache
pre-assembled data, removing the need to repeatedly extract it from
multiple tables to be pieced together in Python, which is also a
very CPU intensive activity. This is something that will be
happening more in the future, but as it improves the performance of
Openstack, it will be removing even more load from the database.
Again, I'd look at API response times as the first thing to measure.<br>
<br>
That said, certainly the joining of data in Python may be
unnecessary and I'm not sure if we can't revisit the history Dan
refers to when he says there were "very large result sets", if we
are referring to number of rows, joining in SQL or in Python will
still involve the same number of "rows", and SQLAlchemy also offers
many techniques of optimizing the overhead of fetching lots of rows
which Nova currently doesn't make use of (see
<a class="moz-txt-link-freetext" href="https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning">https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning</a>
for a primer on this). <br>
<br>
If OTOH we are referring to the width of the columns and the join is
such that you're going to get the same A identity over and over
again, if you join A and B you get a "wide" row with all of A and B
with a very large amount of redundant data sent over the wire again
and again (note that the database drivers available to us in Python
always send all rows and columns over the wire unconditionally,
whether or not we fetch them in application code). In this case you
*do* want to do the join in Python to some extent, though you use
the database to deliver the simplest information possible to work
with first; you get the full row for all of the A entries, then a
second query for all of B plus A's primary key that can be quickly
matched to that of A. SQLAlchemy offers this as "subquery eager
loading" and it is definitely much more performant than a single
full join when you have wide rows for individual entities. The
database is doing the join to the extent that it can deliver the
primary key information for A and B which can be operated upon very
quickly in memory, as we already have all the A identities in a hash
lookup in any case.<br>
<br>
Overall if you're looking to make Openstack faster, where you want
to be is 1. what is the response time of an API call and 2. what do
the Python profiles look like for those API calls? For a primer on
Python profiling see for example my own FAQ entry here:
<a class="moz-txt-link-freetext" href="http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling">http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling</a>.
This kind of profiling is a lot of work and is very tedious,
compared to just running a big rally job and looking at the CPU
overhead. Unfortunately this is the only way one can get actual
meaningful information as to why a Python application is slow. All
other techniques offer us basically nothing as to explaining *why*
something is slow.<br>
<br>
<br>
<br>
<blockquote
cite="mid:CADH8PtPGGGhdWDuE++PN7SbzrLTbpMntEfRUskNpGwiU1DaGCw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I could not find any analysis of nova performance either.
Appreciate if someone can point me to one.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>
<div><br>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug 11, 2015 at 3:57 PM, Chris
Friesen <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:chris.friesen@windriver.com" target="_blank">chris.friesen@windriver.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Just
curious...have you measured this consuming a significant
amount of CPU time? Or is it more a gut feel of "this looks
like it might be expensive"?<br>
<br>
Chris<span class=""><br>
<br>
<br>
On 08/11/2015 04:51 PM, Sachin Manpathak wrote:<br>
</span>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
Here are a few --<br>
instance_get_all_by_filters joins manually with<br>
instances_fill_metadata --<br>
<a moz-do-not-send="true"
href="https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890"
rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890</a><br>
<a moz-do-not-send="true"
href="https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782"
rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782</a><br>
<br>
Almost all instance query functions manually join with
instance_metadata.<br>
<br>
Another example was compute_node_get_all function which
joined compute_node,<br>
services and ip tables. But it is simplified in current
codebase (I am working<br>
on Juno)<br>
<br>
<br>
<br>
<br>
On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum <<a
moz-do-not-send="true" href="mailto:clint@fewbar.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:clint@fewbar.com">clint@fewbar.com</a></a><br>
</span><span class="">
<mailto:<a moz-do-not-send="true"
href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>>>
wrote:<br>
<br>
Excerpts from Sachin Manpathak's message of
2015-08-12 05:40:36 +0800:<br>
> Hi folks,<br>
> Nova codebase seems to follow manual joins
model where all data required by<br>
> an API is fetched from multiple tables and then
joined manually by using<br>
> (in most cases) python dictionary lookups.<br>
><br>
> I was wondering about the basis reasoning for
doing so. I usually find<br>
> openstack services to be CPU bound in a medium
sized environment and<br>
> non-trivial utilization seems to be from parts
of code which do manual<br>
> joins.<br>
<br>
Could you please cite specific examples so we can
follow along with your<br>
thinking without having to repeat your analysis?<br>
<br>
Thanks!<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage
questions)<br>
Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
</span>
<<a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><span
class=""><br>
<br>
<br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage
questions)<br>
Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
</span></blockquote>
<div class="HOEnZb">
<div class="h5">
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage
questions)<br>
Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>