[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate
sean at dague.net
Thu Feb 2 18:08:06 UTC 2017
On 02/02/2017 12:49 PM, Armando M. wrote:
> On 2 February 2017 at 08:40, Sean Dague <sean at dague.net
> <mailto:sean at dague.net>> wrote:
> On 02/02/2017 11:16 AM, Matthew Treinish wrote:
> > <oops, forgot to finish my though>
> > We definitely aren't saying running a single worker is how we recommend people
> > run OpenStack by doing this. But it just adds on to the differences between the
> > gate and what we expect things actually look like.
> I'm all for actually getting to the bottom of this, but honestly real
> memory profiling is needed here. The growth across projects probably
> means that some common libraries are some part of this. The ever growing
> requirements list is demonstrative of that. Code reuse is good, but if
> we are importing much of a library to get access to a couple of
> functions, we're going to take a bunch of memory weight on that
> (especially if that library has friendly auto imports in top level
> __init__.py so we can't get only the parts we want).
> Changing the worker count is just shuffling around deck chairs.
> I'm not familiar enough with memory profiling tools in python to know
> the right approach we should take there to get this down to individual
> libraries / objects that are containing all our memory. Anyone more
> skilled here able to help lead the way?
> From what I hear, the overall consensus on this matter is to determine
> what actually caused the memory consumption bump and how to address it,
> but that's more of a medium to long term action. In fact, to me this is
> one of the top priority matters we should talk about at the imminent PTG.
> For the time being, and to provide relief to the gate, should we want to
> lock the API_WORKERS to 1? I'll post something for review and see how
> many people shoot it down :)
I don't think we want to do that. It's going to force down the eventlet
API workers to being a single process, and it's not super clear that
eventlet handles backups on the inbound socket well. I honestly would
expect that creates different hard to debug issues, especially with high
chatter rates between services.
More information about the OpenStack-dev