<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    <div class="moz-cite-prefix">On 8/11/15 7:14 PM, Sachin Manpathak

      wrote:<br>

    </div>

    <blockquote

cite="mid:CADH8PtPGGGhdWDuE++PN7SbzrLTbpMntEfRUskNpGwiU1DaGCw@mail.gmail.com"

      type="cite">

      <div dir="ltr">I am struggling with python code profiling in

        general. It has its own caveats like 100% plus overhead.

        <div>However, on a host with only nova services (DB on a

          different host), I see cpu utilization spike up quickly with

          scale. The DB server is relatively calm and never goes over

          20%. On a system which relies on DB to fetch all the data,

          this should not happen.</div>

      </div>

    </blockquote>

    The DB's resources are intended to scale up in response to wide

    degree of concurrency, that is, lots and lots of API services all

    hitting it from many concurrent API calls.    "with scale" here is a

    slippery term.  What kind of concurrency are you testing with ?  

    How many CPUs serving API calls are utilized simultaneously?   To

    saturate the database you need many dozens, and even then you don't

    want your database CPU going very high.   20% does not seem that low

    to me, actually.    I disagree with the concept that high database

    CPU refers to a performant application, or that DB saturation is a

    requirement in order for a database-delivered application to be

    performant; I think the opposite is true.     In web application

    development, when I worked with production sites at high volume, the

    goal was to use enough caching so that major site pages being viewed

    constantly could be delivered with *no* database access whatsoever. 

    We wanted to see the majority of the site being sent to customers

    with the database at essentially zero; this is how you get page

    response times down from 200-300 ms down to 20 or 30.      If you

    want to measure performance, looking at API response time is

    probably better than looking at CPU utilization first.<br>

    <br>

    That said, Python is a very CPU intensive language, because it is an

    interpreted scripting language.   Operations that in a language like

    compiled C would be hardly a whisper of CPU end up being major

    operations in Python.     Openstack suffers from a large amount of

    function call overhead even for simple API operations, as it is an

    extremely layered system with very little use of caching.   Until it

    moves to a JIT-based interpreter like Pypy that can flatten out

    call-chains, the amount of overhead just for an API call to come in

    and go back out with a response will remain significant.   As for

    caching, making use of a technique such as memcached caching of data

    structures can also greatly improve performance because we can cache

    pre-assembled data, removing the need to repeatedly extract it from

    multiple tables to be pieced together in Python, which is also a

    very CPU intensive activity.   This is something that will be

    happening more in the future, but as it improves the performance of

    Openstack, it will be removing even more load from the database.  

    Again, I'd look at API response times as the first thing to measure.<br>

    <br>

    That said, certainly the joining of data in Python may be

    unnecessary and I'm not sure if we can't revisit the history Dan

    refers to when he says there were "very large result sets", if we

    are referring to number of rows, joining in SQL or in Python will

    still involve the same number of "rows", and SQLAlchemy also offers

    many techniques of optimizing the overhead of fetching lots of rows

    which Nova currently doesn't make use of (see

    <a class="moz-txt-link-freetext" href="https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning">https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning</a>

    for a primer on this).    <br>

    <br>

    If OTOH we are referring to the width of the columns and the join is

    such that you're going to get the same A identity over and over

    again,  if you join A and B you get a "wide" row with all of A and B

    with a very large amount of redundant data sent over the wire again

    and again (note that the database drivers available to us in Python

    always send all rows and columns over the wire unconditionally,

    whether or not we fetch them in application code).  In this case you

    *do* want to do the join in Python to some extent, though you use

    the database to deliver the simplest information possible to work

    with first; you get the full row for all of the A entries, then a

    second query for all of B plus A's primary key that can be quickly

    matched to that of A.    SQLAlchemy offers this as "subquery eager

    loading" and it is definitely much more performant than a single

    full join when you have wide rows for individual entities.    The

    database is doing the join to the extent that it can deliver the

    primary key information for A and B which can be operated upon very

    quickly in memory, as we already have all the A identities in a hash

    lookup in any case.<br>

    <br>

    Overall if you're looking to make Openstack faster, where you want

    to be is 1. what is the response time of an API call and 2. what do

    the Python profiles look like for those API calls?  For a primer on

    Python profiling see for example my own FAQ entry here:

    <a class="moz-txt-link-freetext" href="http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling">http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling</a>.   

    This kind of profiling is a lot of work and is very tedious,

    compared to just running a big rally job and looking at the CPU

    overhead.   Unfortunately this is the only way one can get actual

    meaningful information as to why a Python application is slow.  All

    other techniques offer us basically nothing as to explaining *why*

    something is slow.<br>

    <br>

    <br>

    <br>

    <blockquote

cite="mid:CADH8PtPGGGhdWDuE++PN7SbzrLTbpMntEfRUskNpGwiU1DaGCw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>I could not find any analysis of nova performance either.

          Appreciate if someone can point me to one.</div>

        <div><br>

        </div>

        <div>Thanks,</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>

          <div>

            <div><br>

            </div>

          </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Aug 11, 2015 at 3:57 PM, Chris

          Friesen <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:chris.friesen@windriver.com" target="_blank">chris.friesen@windriver.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Just

            curious...have you measured this consuming a significant

            amount of CPU time?  Or is it more a gut feel of "this looks

            like it might be expensive"?<br>

            <br>

            Chris<span class=""><br>

              <br>

              <br>

              On 08/11/2015 04:51 PM, Sachin Manpathak wrote:<br>

            </span>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="">

                Here are a few --<br>

                instance_get_all_by_filters joins manually with<br>

                instances_fill_metadata --<br>

                <a moz-do-not-send="true"

href="https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890"

                  rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890</a><br>

                <a moz-do-not-send="true"

href="https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782"

                  rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782</a><br>

                <br>

                Almost all instance query functions manually join with

                instance_metadata.<br>

                <br>

                Another example was compute_node_get_all function which

                joined compute_node,<br>

                services and ip tables. But it is simplified  in current

                codebase (I am working<br>

                on Juno)<br>

                <br>

                <br>

                <br>

                <br>

                On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum <<a

                  moz-do-not-send="true" href="mailto:clint@fewbar.com"

                  target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:clint@fewbar.com">clint@fewbar.com</a></a><br>

              </span><span class="">

                <mailto:<a moz-do-not-send="true"

                  href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>>>

                wrote:<br>

                <br>

                    Excerpts from Sachin Manpathak's message of

                2015-08-12 05:40:36 +0800:<br>

                    > Hi folks,<br>

                    > Nova codebase seems to follow manual joins

                model where all data required by<br>

                    > an API is fetched from multiple tables and then

                joined manually by using<br>

                    > (in most cases) python dictionary lookups.<br>

                    ><br>

                    > I was wondering about the basis reasoning for

                doing so. I usually find<br>

                    > openstack services to be CPU bound in a medium

                sized environment and<br>

                    > non-trivial utilization seems to be from parts

                of code which do manual<br>

                    > joins.<br>

                <br>

                    Could you please cite specific examples so we can

                follow along with your<br>

                    thinking without having to repeat your analysis?<br>

                <br>

                    Thanks!<br>

                <br>

__________________________________________________________________________<br>

                    OpenStack Development Mailing List (not for usage

                questions)<br>

                    Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                  rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

              </span>

                  <<a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                rel="noreferrer" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>

                  <a moz-do-not-send="true"

                href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><span

                class=""><br>

                <br>

                <br>

                <br>

                <br>

__________________________________________________________________________<br>

                OpenStack Development Mailing List (not for usage

                questions)<br>

                Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                  rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

                <a moz-do-not-send="true"

                  href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                  rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

                <br>

              </span></blockquote>

            <div class="HOEnZb">

              <div class="h5">

                <br>

                <br>

__________________________________________________________________________<br>

                OpenStack Development Mailing List (not for usage

                questions)<br>

                Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                  rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

                <a moz-do-not-send="true"

                  href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                  rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)

Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>

<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>