[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

IWAMOTO Toshihiro iwamoto at valinux.co.jp
Thu Feb 2 06:20:43 UTC 2017


At Wed, 1 Feb 2017 17:37:34 -0700,
Kevin Benton wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (7bit)>]
> And who said openstack wasn't growing? ;)
> 
> I think reducing API workers is a nice quick way to bring back some
> stability.
> 
> I have spent a bunch of time digging into the OOM killer events and haven't
> yet figured out why they are being triggered. There is significant swap
> space remaining in all of the cases I have seen so it's likely some memory

We can try increasing watermark_scale_factor instead.
I looked at random 2 oom-killer invocations but free mem were above
watermark. oom-killer were triggered by 16kB contig page alloation by
apparmor_file_alloc_security, so if we can try disabling apparmor that
may also work.


> locking issue or kernel allocations blocking swap. Until we can figure out
> the cause, we effectively have no usable swap space on the test instances
> so we are limited to 8GB.
> 
> On Feb 1, 2017 17:27, "Armando M." <armamig at gmail.com> wrote:
> 
> > Hi,
> >
> > [TL;DR]: OpenStack services have steadily increased their memory
> > footprints. We need a concerted way to address the oom-kills experienced in
> > the openstack gate, as we may have reached a ceiling.
> >
> > Now the longer version:
> > --------------------------------
> >
> > We have been experiencing some instability in the gate lately due to a
> > number of reasons. When everything adds up, this means it's rather
> > difficult to merge anything and knowing we're in feature freeze, that adds
> > to stress. One culprit was identified to be [1].
> >
> > We initially tried to increase the swappiness, but that didn't seem to
> > help. Then we have looked at the resident memory in use. When going back
> > over the past three releases we have noticed that the aggregated memory
> > footprint of some openstack projects has grown steadily. We have the
> > following:
> >
> >    - Mitaka
> >       - neutron: 1.40GB
> >       - nova: 1.70GB
> >       - swift: 640MB
> >       - cinder: 730MB
> >       - keystone: 760MB
> >       - horizon: 17MB
> >       - glance: 538MB
> >    - Newton
> >    - neutron: 1.59GB (+13%)
> >       - nova: 1.67GB (-1%)
> >       - swift: 779MB (+21%)
> >       - cinder: 878MB (+20%)
> >       - keystone: 919MB (+20%)
> >       - horizon: 21MB (+23%)
> >       - glance: 721MB (+34%)
> >    - Ocata
> >       - neutron: 1.75GB (+10%)
> >       - nova: 1.95GB (%16%)
> >       - swift: 703MB (-9%)
> >       - cinder: 920MB (4%)
> >       - keystone: 903MB (-1%)
> >       - horizon: 25MB (+20%)
> >       - glance: 740MB (+2%)
> >
> > Numbers are approximated and I only took a couple of samples, but in a
> > nutshell, the majority of the services have seen double digit growth over
> > the past two cycles in terms of the amount or RSS memory they use.
> >
> > Since [1] is observed only since ocata [2], I imagine that's pretty
> > reasonable to assume that memory increase might as well be a determining
> > factor to the oom-kills we see in the gate.
> >
> > Profiling and surgically reducing the memory used by each component in
> > each service is a lengthy process, but I'd rather see some gate relief
> > right away. Reducing the number of API workers helps bring the RSS memory
> > down back to mitaka levels:
> >
> >    - neutron: 1.54GB
> >    - nova: 1.24GB
> >    - swift: 694MB
> >    - cinder: 778MB
> >    - keystone: 891MB
> >    - horizon: 24MB
> >    - glance: 490MB
> >
> > However, it may have other side effects, like longer execution times, or
> > increase of timeouts.
> >
> > Where do we go from here? I am not particularly fond of stop-gap [4], but
> > it is the one fix that most widely address the memory increase we have
> > experienced across the board.
> >
> > Thanks,
> > Armando
> >
> > [1] https://bugs.launchpad.net/neutron/+bug/1656386
> > [2] http://logstash.openstack.org/#/dashboard/file/logstash.
> > json?query=message:%5C%22oom-killer%5C%22%20AND%20tags:syslog
> > [3] http://logs.openstack.org/21/427921/1/check/gate-
> > tempest-dsvm-neutron-full-ubuntu-xenial/82084c2/
> > [4] https://review.openstack.org/#/c/427921
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> [2  <text/plain; us-ascii (7bit)>]
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list