[openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

Mike Bayer mbayer at redhat.com
Tue Sep 9 00:12:55 UTC 2014


Hi All - 

Joe had me do some quick memory profiling on nova, just an FYI if anyone wants to play with this technique, I place a little bit of memory profiling code using Guppy into nova/api/__init__.py, or anywhere in your favorite app that will definitely get imported when the thing first runs:

from guppy import hpy
import signal
import datetime

def handler(signum, frame):
    print "guppy memory dump"

    fname = "/tmp/memory_%s.txt" % datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    prof = hpy().heap()
    with open(fname, 'w') as handle:
        prof.dump(handle)
    del prof

signal.signal(signal.SIGUSR2, handler)



Then, run nova-api, run some API calls, then you hit the nova-api process with a SIGUSR2 signal, and it will dump a profile into /tmp/ like this:

http://paste.openstack.org/show/108536/

Now obviously everyone is like, oh boy memory lets go beat up SQLAlchemy again…..which is fine I can take it.  In that particular profile, there’s a bunch of SQLAlchemy stuff, but that is all structural to the classes that are mapped in Nova API, e.g. 52 classes with a total of 656 attributes mapped.   That stuff sets up once and doesn’t change.   If Nova used less ORM,  e.g. didn’t map everything, that would be less.  But in that profile there’s no “data” lying around.

But even if you don’t have that many objects resident, your Python process might still be using up a ton of memory.  The reason for this is that the cPython interpreter has a model where it will grab all the memory it needs to do something, a time consuming process by the way, but then it really doesn’t ever release it  (see http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm for the “classic” answer on this, things may have improved/modernized in 2.7 but I think this is still the general idea).

So in terms of SQLAlchemy, a good way to suck up a ton of memory all at once that probably won’t get released is to do this:

1. fetching a full ORM object with all of its data

2. fetching lots of them all at once


So to avoid doing that, the answer isn’t necessarily that simple.   The quick wins to loading full objects are to …not load the whole thing!   E.g. assuming we can get Openstack onto 0.9 in requirements.txt, we can start using load_only():

session.query(MyObject).options(load_only(“id”, “name”, “ip”))

or with any version, just load those columns - we should be using this as much as possible for any query that is row/time intensive and doesn’t need full ORM behaviors (like relationships, persistence):

session.query(MyObject.id, MyObject.name, MyObject.ip)

Another quick win, if we *really* need an ORM object, not a row, and we have to fetch a ton of them in one big result, is to fetch them using yield_per():

   for obj in session.query(MyObject).yield_per(100):
        # work with obj and then make sure to lose all references to it

yield_per() will dish out objects drawing from batches of the number you give it.   But it has two huge caveats: one is that it isn’t compatible with most forms of eager loading, except for many-to-one joined loads.  The other is that the DBAPI, e.g. like the MySQL driver, does *not* stream the rows; virtually all DBAPIs by default load a result set fully before you ever see the first row.  psycopg2 is one of the only DBAPIs that even offers a special mode to work around this (server side cursors).

Which means its even *better* to paginate result sets, so that you only ask the database for a chunk at a time, only storing at most a subset of objects in memory at once.  Pagination itself is tricky, if you are using a naive LIMIT/OFFSET approach, it takes awhile if you are working with a large OFFSET.  It’s better to SELECT into windows of data, where you can specify a start and end criteria (against an indexed column) for each window, like a timestamp.

Then of course, using Core only is another level of fastness/low memory.  Though querying for individual columns with ORM is not far off, and I’ve also made some major improvements to that in 1.0 so that query(*cols) is pretty competitive with straight Core (and Core is…well I’d say becoming visible in raw DBAPI’s rear view mirror, at least….).

What I’d suggest here is that we start to be mindful of memory/performance patterns and start to work out naive ORM use into more savvy patterns; being aware of what columns are needed, what rows, how many SQL queries we really need to emit, what the “worst case” number of rows will be for sections that really need to scale.  By far the hardest part is recognizing and reimplementing when something might have to deal with an arbitrarily large number of rows, which means organizing that code to deal with a “streaming” pattern where you never have all the rows in memory at once - on other projects I’ve had tasks that would normally take about a day, but in order to organize it to “scale”, took weeks - such as being able to write out a 1G XML file from a database (yes, actual use case - not only do you have to stream your database data, but you also have to stream out your DOM nodes for which I had to write some fancy SAX extensions).   

I know that using the ORM makes SQL development “easy”, and so many anti-ORM articles insist that this lulls us all into not worrying about what is actually going on (as much as SQLAlchemy eschews that way of working)…but I remain optimistic that it *is* possible to use tools that save a vast amount of effort, code verbosity and inconsistency that results from doing everything “by hand”, while at the same time not losing our ability to understand how we’re talking to the database.   It’s a cake and eat it too, situation, I know.

This is already what I’m here to contribute on, I’ve been working out some new SQLAlchemy patterns that hopefully will help, but in the coming weeks I may try to find time to spot some more of these particular things within current Nova code without getting too much into a total rewrite as of yet.





On Sep 8, 2014, at 6:24 PM, Joe Gordon <joe.gordon0 at gmail.com> wrote:

> Hi All,
> 
> We have recently started seeing assorted memory issues in the gate including the oom-killer [0] and libvirt throwing memory errors [1]. Luckily we run ps and dstat on every devstack run so we have some insight into why we are running out of memory. Based on the output from job taken at random [2][3] a typical run consists of:
> 
> * 68 openstack api processes alone
> * the following services are running 8 processes (number of CPUs on test nodes)
>   * nova-api (we actually run 24 of these, 8 compute, 8 EC2, 8 metadata)
>   * nova-conductor
>   * cinder-api
>   * glance-api
>   * trove-api
>   * glance-registry
>   * trove-conductor
> * together nova-api, nova-conductor, cinder-api alone take over 45 %MEM (note: some of that is memory usage is counted multiple times as RSS includes shared libraries)
> * based on dstat numbers, it looks like we don't use that much memory before tempest runs, and after tempest runs we use a lot of memory.
> 
> Based on this information I have two categories of questions:
> 
> 1) Should we explicitly set the number of workers that services use in devstack? Why have so many workers in a small all-in-one environment? What is the right balance here?
> 
> 2) Should we be worried that some OpenStack services such as nova-api, nova-conductor and cinder-api take up so much memory? Does there memory usage keep growing over time, does anyone have any numbers to answer this? Why do these processes take up so much memory?
> 
> best,
> Joe
> 
> 
> [0] http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwib29tLWtpbGxlclwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDEwMjExMjA5NzY3fQ==
> [1] https://bugs.launchpad.net/nova/+bug/1366931
> [2] http://paste.openstack.org/show/108458/
> [3] http://logs.openstack.org/83/119183/4/check/check-tempest-dsvm-full/ea576e7/logs/screen-dstat.txt.gz
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140908/f563be97/attachment.html>


More information about the OpenStack-dev mailing list