[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

Kevin Benton kevin at benton.pub
Thu Feb 2 21:07:18 UTC 2017


This error seems to be new in the ocata cycle. It's either related to a
dependency change or the fact that we put Apache in between the services
now. Handling more concurrent requests than workers wasn't an issue before.


It seems that you are suggesting that eventlet can't handle concurrent
connections, which is the entire purpose of the library, no?

On Feb 2, 2017 13:53, "Sean Dague" <sean at dague.net> wrote:

> On 02/02/2017 03:32 PM, Armando M. wrote:
> >
> >
> > On 2 February 2017 at 12:19, Sean Dague <sean at dague.net
> > <mailto:sean at dague.net>> wrote:
> >
> >     On 02/02/2017 02:28 PM, Armando M. wrote:
> >     >
> >     >
> >     > On 2 February 2017 at 10:08, Sean Dague <sean at dague.net <mailto:
> sean at dague.net>
> >     > <mailto:sean at dague.net <mailto:sean at dague.net>>> wrote:
> >     >
> >     >     On 02/02/2017 12:49 PM, Armando M. wrote:
> >     >     >
> >     >     >
> >     >     > On 2 February 2017 at 08:40, Sean Dague <sean at dague.net
> <mailto:sean at dague.net> <mailto:sean at dague.net
> >     <mailto:sean at dague.net>>
> >     >     > <mailto:sean at dague.net <mailto:sean at dague.net>
> >     <mailto:sean at dague.net <mailto:sean at dague.net>>>> wrote:
> >     >     >
> >     >     >     On 02/02/2017 11:16 AM, Matthew Treinish wrote:
> >     >     >     <snip>
> >     >     >     > <oops, forgot to finish my though>
> >     >     >     >
> >     >     >     > We definitely aren't saying running a single worker is
> how
> >     >     we recommend people
> >     >     >     > run OpenStack by doing this. But it just adds on to the
> >     >     differences between the
> >     >     >     > gate and what we expect things actually look like.
> >     >     >
> >     >     >     I'm all for actually getting to the bottom of this, but
> >     >     honestly real
> >     >     >     memory profiling is needed here. The growth across
> projects
> >     >     probably
> >     >     >     means that some common libraries are some part of this.
> The
> >     >     ever growing
> >     >     >     requirements list is demonstrative of that. Code reuse is
> >     >     good, but if
> >     >     >     we are importing much of a library to get access to a
> >     couple of
> >     >     >     functions, we're going to take a bunch of memory weight
> >     on that
> >     >     >     (especially if that library has friendly auto imports in
> >     top level
> >     >     >     __init__.py so we can't get only the parts we want).
> >     >     >
> >     >     >     Changing the worker count is just shuffling around deck
> >     chairs.
> >     >     >
> >     >     >     I'm not familiar enough with memory profiling tools in
> >     python
> >     >     to know
> >     >     >     the right approach we should take there to get this down
> to
> >     >     individual
> >     >     >     libraries / objects that are containing all our memory.
> >     Anyone
> >     >     more
> >     >     >     skilled here able to help lead the way?
> >     >     >
> >     >     >
> >     >     > From what I hear, the overall consensus on this matter is to
> >     determine
> >     >     > what actually caused the memory consumption bump and how to
> >     >     address it,
> >     >     > but that's more of a medium to long term action. In fact, to
> me
> >     >     this is
> >     >     > one of the top priority matters we should talk about at the
> >     >     imminent PTG.
> >     >     >
> >     >     > For the time being, and to provide relief to the gate,
> should we
> >     >     want to
> >     >     > lock the API_WORKERS to 1? I'll post something for review
> >     and see how
> >     >     > many people shoot it down :)
> >     >
> >     >     I don't think we want to do that. It's going to force down the
> >     eventlet
> >     >     API workers to being a single process, and it's not super
> >     clear that
> >     >     eventlet handles backups on the inbound socket well. I
> >     honestly would
> >     >     expect that creates different hard to debug issues, especially
> >     with high
> >     >     chatter rates between services.
> >     >
> >     >
> >     > I must admit I share your fear, but out of the tests that I have
> >     > executed so far in [1,2,3], the house didn't burn in a fire. I am
> >     > looking for other ways to have a substantial memory saving with a
> >     > relatively quick and dirty fix, but coming up empty handed thus
> far.
> >     >
> >     > [1] https://review.openstack.org/#/c/428303/
> >     <https://review.openstack.org/#/c/428303/>
> >     > [2] https://review.openstack.org/#/c/427919/
> >     <https://review.openstack.org/#/c/427919/>
> >     > [3] https://review.openstack.org/#/c/427921/
> >     <https://review.openstack.org/#/c/427921/>
> >
> >     This failure in the first patch -
> >     http://logs.openstack.org/03/428303/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-
> api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751
> >     <http://logs.openstack.org/03/428303/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-
> api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751>
> >     looks exactly like I would expect by API Worker starvation.
> >
> >
> > Not sure I agree on this one, this has been observed multiple times in
> > the gate already [1] (though I am not sure there's a bug for it), and I
> > don't believe it has anything to do with the number of API workers,
> > unless not even two workers are enough.
>
> There is no guarntee that 2 workers are enough. I'm not surprised if we
> see that failure some today. This was all guess work on trimming worker
> counts to deal with the memory issue in the past. But we're running
> tests in parallel, and the services are making calls back to other
> services all the time.
>
> This is one of the reasons to get the wsgi stack off of eventlet and
> into a real webserver, as they handle HTTP request backups much much
> better.
>
> I do understand that people want a quick fix here, but I'm not convinced
> that it exists.
>
>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170202/ba68dc87/attachment-0001.html>


More information about the OpenStack-dev mailing list