[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

Joshua Harlow harlowja at fastmail.com
Fri Feb 3 06:12:27 UTC 2017


Has anyone tried:

https://github.com/mgedmin/dozer/blob/master/dozer/leak.py#L72

This piece of middleware creates some nice graphs (using PIL) that may 
help identify which areas are using what memory (and/or leaking).

https://pypi.python.org/pypi/linesman might also be somewhat useful to 
have running.

How any process takes more than 100MB here blows my mind (horizon is 
doing nicely, ha); what are people caching in process to have RSS that 
large (1.95 GB, woah).

Armando M. wrote:
> Hi,
>
> [TL;DR]: OpenStack services have steadily increased their memory
> footprints. We need a concerted way to address the oom-kills experienced
> in the openstack gate, as we may have reached a ceiling.
>
> Now the longer version:
> --------------------------------
>
> We have been experiencing some instability in the gate lately due to a
> number of reasons. When everything adds up, this means it's rather
> difficult to merge anything and knowing we're in feature freeze, that
> adds to stress. One culprit was identified to be [1].
>
> We initially tried to increase the swappiness, but that didn't seem to
> help. Then we have looked at the resident memory in use. When going back
> over the past three releases we have noticed that the aggregated memory
> footprint of some openstack projects has grown steadily. We have the
> following:
>
>   * Mitaka
>       o neutron: 1.40GB
>       o nova: 1.70GB
>       o swift: 640MB
>       o cinder: 730MB
>       o keystone: 760MB
>       o horizon: 17MB
>       o glance: 538MB
>   * Newton
>       o neutron: 1.59GB (+13%)
>       o nova: 1.67GB (-1%)
>       o swift: 779MB (+21%)
>       o cinder: 878MB (+20%)
>       o keystone: 919MB (+20%)
>       o horizon: 21MB (+23%)
>       o glance: 721MB (+34%)
>   * Ocata
>       o neutron: 1.75GB (+10%)
>       o nova: 1.95GB (%16%)
>       o swift: 703MB (-9%)
>       o cinder: 920MB (4%)
>       o keystone: 903MB (-1%)
>       o horizon: 25MB (+20%)
>       o glance: 740MB (+2%)
>
> Numbers are approximated and I only took a couple of samples, but in a
> nutshell, the majority of the services have seen double digit growth
> over the past two cycles in terms of the amount or RSS memory they use.
>
> Since [1] is observed only since ocata [2], I imagine that's pretty
> reasonable to assume that memory increase might as well be a determining
> factor to the oom-kills we see in the gate.
>
> Profiling and surgically reducing the memory used by each component in
> each service is a lengthy process, but I'd rather see some gate relief
> right away. Reducing the number of API workers helps bring the RSS
> memory down back to mitaka levels:
>
>   * neutron: 1.54GB
>   * nova: 1.24GB
>   * swift: 694MB
>   * cinder: 778MB
>   * keystone: 891MB
>   * horizon: 24MB
>   * glance: 490MB
>
> However, it may have other side effects, like longer execution times, or
> increase of timeouts.
>
> Where do we go from here? I am not particularly fond of stop-gap [4],
> but it is the one fix that most widely address the memory increase we
> have experienced across the board.
>
> Thanks,
> Armando
>
> [1] https://bugs.launchpad.net/neutron/+bug/1656386
> <https://bugs.launchpad.net/neutron/+bug/1656386>
> [2]
> http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22oom-killer%5C%22%20AND%20tags:syslog
> [3]
> http://logs.openstack.org/21/427921/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/82084c2/
> [4] https://review.openstack.org/#/c/427921
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list