[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

Armando M. armamig at gmail.com
Thu Feb 2 20:32:55 UTC 2017


On 2 February 2017 at 12:19, Sean Dague <sean at dague.net> wrote:

> On 02/02/2017 02:28 PM, Armando M. wrote:
> >
> >
> > On 2 February 2017 at 10:08, Sean Dague <sean at dague.net
> > <mailto:sean at dague.net>> wrote:
> >
> >     On 02/02/2017 12:49 PM, Armando M. wrote:
> >     >
> >     >
> >     > On 2 February 2017 at 08:40, Sean Dague <sean at dague.net <mailto:
> sean at dague.net>
> >     > <mailto:sean at dague.net <mailto:sean at dague.net>>> wrote:
> >     >
> >     >     On 02/02/2017 11:16 AM, Matthew Treinish wrote:
> >     >     <snip>
> >     >     > <oops, forgot to finish my though>
> >     >     >
> >     >     > We definitely aren't saying running a single worker is how
> >     we recommend people
> >     >     > run OpenStack by doing this. But it just adds on to the
> >     differences between the
> >     >     > gate and what we expect things actually look like.
> >     >
> >     >     I'm all for actually getting to the bottom of this, but
> >     honestly real
> >     >     memory profiling is needed here. The growth across projects
> >     probably
> >     >     means that some common libraries are some part of this. The
> >     ever growing
> >     >     requirements list is demonstrative of that. Code reuse is
> >     good, but if
> >     >     we are importing much of a library to get access to a couple of
> >     >     functions, we're going to take a bunch of memory weight on that
> >     >     (especially if that library has friendly auto imports in top
> level
> >     >     __init__.py so we can't get only the parts we want).
> >     >
> >     >     Changing the worker count is just shuffling around deck chairs.
> >     >
> >     >     I'm not familiar enough with memory profiling tools in python
> >     to know
> >     >     the right approach we should take there to get this down to
> >     individual
> >     >     libraries / objects that are containing all our memory. Anyone
> >     more
> >     >     skilled here able to help lead the way?
> >     >
> >     >
> >     > From what I hear, the overall consensus on this matter is to
> determine
> >     > what actually caused the memory consumption bump and how to
> >     address it,
> >     > but that's more of a medium to long term action. In fact, to me
> >     this is
> >     > one of the top priority matters we should talk about at the
> >     imminent PTG.
> >     >
> >     > For the time being, and to provide relief to the gate, should we
> >     want to
> >     > lock the API_WORKERS to 1? I'll post something for review and see
> how
> >     > many people shoot it down :)
> >
> >     I don't think we want to do that. It's going to force down the
> eventlet
> >     API workers to being a single process, and it's not super clear that
> >     eventlet handles backups on the inbound socket well. I honestly would
> >     expect that creates different hard to debug issues, especially with
> high
> >     chatter rates between services.
> >
> >
> > I must admit I share your fear, but out of the tests that I have
> > executed so far in [1,2,3], the house didn't burn in a fire. I am
> > looking for other ways to have a substantial memory saving with a
> > relatively quick and dirty fix, but coming up empty handed thus far.
> >
> > [1] https://review.openstack.org/#/c/428303/
> > [2] https://review.openstack.org/#/c/427919/
> > [3] https://review.openstack.org/#/c/427921/
>
> This failure in the first patch -
> http://logs.openstack.org/03/428303/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-
> api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751
> looks exactly like I would expect by API Worker starvation.
>

Not sure I agree on this one, this has been observed multiple times in the
gate already [1] (though I am not sure there's a bug for it), and I don't
believe it has anything to do with the number of API workers, unless not
even two workers are enough.

[1]
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22('Connection%20aborted.'%2C%20BadStatusLine(%5C%22''%5C%22%2C)%5C%22



>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170202/e4eba245/attachment.html>


More information about the OpenStack-dev mailing list