[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

Dolph Mathews dolph.mathews at gmail.com
Thu Feb 2 04:27:51 UTC 2017


What made most services jump +20% between mitaka and newton? Maybe there is
a common cause that we can tackle.

I'd also be in favor of reducing the number of workers in the gate,
assuming that doesn't also substantially increase the runtime of gate jobs.
Does that environment variable (API_WORKERS) affect keystone and horizon?

On Wed, Feb 1, 2017 at 6:39 PM Kevin Benton <kevin at benton.pub> wrote:

> And who said openstack wasn't growing? ;)
>
> I think reducing API workers is a nice quick way to bring back some
> stability.
>
> I have spent a bunch of time digging into the OOM killer events and
> haven't yet figured out why they are being triggered. There is significant
> swap space remaining in all of the cases I have seen so it's likely some
> memory locking issue or kernel allocations blocking swap. Until we can
> figure out the cause, we effectively have no usable swap space on the test
> instances so we are limited to 8GB.
>
> On Feb 1, 2017 17:27, "Armando M." <armamig at gmail.com> wrote:
>
> Hi,
>
> [TL;DR]: OpenStack services have steadily increased their memory
> footprints. We need a concerted way to address the oom-kills experienced in
> the openstack gate, as we may have reached a ceiling.
>
> Now the longer version:
> --------------------------------
>
> We have been experiencing some instability in the gate lately due to a
> number of reasons. When everything adds up, this means it's rather
> difficult to merge anything and knowing we're in feature freeze, that adds
> to stress. One culprit was identified to be [1].
>
> We initially tried to increase the swappiness, but that didn't seem to
> help. Then we have looked at the resident memory in use. When going back
> over the past three releases we have noticed that the aggregated memory
> footprint of some openstack projects has grown steadily. We have the
> following:
>
>    - Mitaka
>       - neutron: 1.40GB
>       - nova: 1.70GB
>       - swift: 640MB
>       - cinder: 730MB
>       - keystone: 760MB
>       - horizon: 17MB
>       - glance: 538MB
>    - Newton
>    - neutron: 1.59GB (+13%)
>       - nova: 1.67GB (-1%)
>       - swift: 779MB (+21%)
>       - cinder: 878MB (+20%)
>       - keystone: 919MB (+20%)
>       - horizon: 21MB (+23%)
>       - glance: 721MB (+34%)
>    - Ocata
>       - neutron: 1.75GB (+10%)
>       - nova: 1.95GB (%16%)
>       - swift: 703MB (-9%)
>       - cinder: 920MB (4%)
>       - keystone: 903MB (-1%)
>       - horizon: 25MB (+20%)
>       - glance: 740MB (+2%)
>
> Numbers are approximated and I only took a couple of samples, but in a
> nutshell, the majority of the services have seen double digit growth over
> the past two cycles in terms of the amount or RSS memory they use.
>
> Since [1] is observed only since ocata [2], I imagine that's pretty
> reasonable to assume that memory increase might as well be a determining
> factor to the oom-kills we see in the gate.
>
> Profiling and surgically reducing the memory used by each component in
> each service is a lengthy process, but I'd rather see some gate relief
> right away. Reducing the number of API workers helps bring the RSS memory
> down back to mitaka levels:
>
>    - neutron: 1.54GB
>    - nova: 1.24GB
>    - swift: 694MB
>    - cinder: 778MB
>    - keystone: 891MB
>    - horizon: 24MB
>    - glance: 490MB
>
> However, it may have other side effects, like longer execution times, or
> increase of timeouts.
>
> Where do we go from here? I am not particularly fond of stop-gap [4], but
> it is the one fix that most widely address the memory increase we have
> experienced across the board.
>
> Thanks,
> Armando
>
> [1] https://bugs.launchpad.net/neutron/+bug/1656386
> [2]
> http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22oom-killer%5C%22%20AND%20tags:syslog
> [3]
> http://logs.openstack.org/21/427921/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/82084c2/
> [4] https://review.openstack.org/#/c/427921
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 
-Dolph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170202/44573cf4/attachment-0001.html>


More information about the OpenStack-dev mailing list