[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

Armando M. armamig at gmail.com
Thu Feb 2 21:36:12 UTC 2017


On 2 February 2017 at 13:34, Ihar Hrachyshka <ihrachys at redhat.com> wrote:

> The BadStatusLine error is well known:
> https://bugs.launchpad.net/nova/+bug/1630664


That's the one! I knew it I had seen it in the past!


>
>
> Now, it doesn't mean that the root cause of the error message is the
> same, and it may as well be that lowering the number of workers
> triggered it. All I am saying is we saw that error in the past.
>
> Ihar
>
> On Thu, Feb 2, 2017 at 1:07 PM, Kevin Benton <kevin at benton.pub> wrote:
> > This error seems to be new in the ocata cycle. It's either related to a
> > dependency change or the fact that we put Apache in between the services
> > now. Handling more concurrent requests than workers wasn't an issue
> before.
> >
> > It seems that you are suggesting that eventlet can't handle concurrent
> > connections, which is the entire purpose of the library, no?
> >
> > On Feb 2, 2017 13:53, "Sean Dague" <sean at dague.net> wrote:
> >>
> >> On 02/02/2017 03:32 PM, Armando M. wrote:
> >> >
> >> >
> >> > On 2 February 2017 at 12:19, Sean Dague <sean at dague.net
> >> > <mailto:sean at dague.net>> wrote:
> >> >
> >> >     On 02/02/2017 02:28 PM, Armando M. wrote:
> >> >     >
> >> >     >
> >> >     > On 2 February 2017 at 10:08, Sean Dague <sean at dague.net
> >> > <mailto:sean at dague.net>
> >> >     > <mailto:sean at dague.net <mailto:sean at dague.net>>> wrote:
> >> >     >
> >> >     >     On 02/02/2017 12:49 PM, Armando M. wrote:
> >> >     >     >
> >> >     >     >
> >> >     >     > On 2 February 2017 at 08:40, Sean Dague <sean at dague.net
> >> > <mailto:sean at dague.net> <mailto:sean at dague.net
> >> >     <mailto:sean at dague.net>>
> >> >     >     > <mailto:sean at dague.net <mailto:sean at dague.net>
> >> >     <mailto:sean at dague.net <mailto:sean at dague.net>>>> wrote:
> >> >     >     >
> >> >     >     >     On 02/02/2017 11:16 AM, Matthew Treinish wrote:
> >> >     >     >     <snip>
> >> >     >     >     > <oops, forgot to finish my though>
> >> >     >     >     >
> >> >     >     >     > We definitely aren't saying running a single worker
> is
> >> > how
> >> >     >     we recommend people
> >> >     >     >     > run OpenStack by doing this. But it just adds on to
> >> > the
> >> >     >     differences between the
> >> >     >     >     > gate and what we expect things actually look like.
> >> >     >     >
> >> >     >     >     I'm all for actually getting to the bottom of this,
> but
> >> >     >     honestly real
> >> >     >     >     memory profiling is needed here. The growth across
> >> > projects
> >> >     >     probably
> >> >     >     >     means that some common libraries are some part of
> this.
> >> > The
> >> >     >     ever growing
> >> >     >     >     requirements list is demonstrative of that. Code reuse
> >> > is
> >> >     >     good, but if
> >> >     >     >     we are importing much of a library to get access to a
> >> >     couple of
> >> >     >     >     functions, we're going to take a bunch of memory
> weight
> >> >     on that
> >> >     >     >     (especially if that library has friendly auto imports
> in
> >> >     top level
> >> >     >     >     __init__.py so we can't get only the parts we want).
> >> >     >     >
> >> >     >     >     Changing the worker count is just shuffling around
> deck
> >> >     chairs.
> >> >     >     >
> >> >     >     >     I'm not familiar enough with memory profiling tools in
> >> >     python
> >> >     >     to know
> >> >     >     >     the right approach we should take there to get this
> down
> >> > to
> >> >     >     individual
> >> >     >     >     libraries / objects that are containing all our
> memory.
> >> >     Anyone
> >> >     >     more
> >> >     >     >     skilled here able to help lead the way?
> >> >     >     >
> >> >     >     >
> >> >     >     > From what I hear, the overall consensus on this matter is
> to
> >> >     determine
> >> >     >     > what actually caused the memory consumption bump and how
> to
> >> >     >     address it,
> >> >     >     > but that's more of a medium to long term action. In fact,
> to
> >> > me
> >> >     >     this is
> >> >     >     > one of the top priority matters we should talk about at
> the
> >> >     >     imminent PTG.
> >> >     >     >
> >> >     >     > For the time being, and to provide relief to the gate,
> >> > should we
> >> >     >     want to
> >> >     >     > lock the API_WORKERS to 1? I'll post something for review
> >> >     and see how
> >> >     >     > many people shoot it down :)
> >> >     >
> >> >     >     I don't think we want to do that. It's going to force down
> the
> >> >     eventlet
> >> >     >     API workers to being a single process, and it's not super
> >> >     clear that
> >> >     >     eventlet handles backups on the inbound socket well. I
> >> >     honestly would
> >> >     >     expect that creates different hard to debug issues,
> especially
> >> >     with high
> >> >     >     chatter rates between services.
> >> >     >
> >> >     >
> >> >     > I must admit I share your fear, but out of the tests that I have
> >> >     > executed so far in [1,2,3], the house didn't burn in a fire. I
> am
> >> >     > looking for other ways to have a substantial memory saving with
> a
> >> >     > relatively quick and dirty fix, but coming up empty handed thus
> >> > far.
> >> >     >
> >> >     > [1] https://review.openstack.org/#/c/428303/
> >> >     <https://review.openstack.org/#/c/428303/>
> >> >     > [2] https://review.openstack.org/#/c/427919/
> >> >     <https://review.openstack.org/#/c/427919/>
> >> >     > [3] https://review.openstack.org/#/c/427921/
> >> >     <https://review.openstack.org/#/c/427921/>
> >> >
> >> >     This failure in the first patch -
> >> >
> >> > http://logs.openstack.org/03/428303/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-
> api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751
> >> >
> >> > <http://logs.openstack.org/03/428303/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-
> api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751>
> >> >     looks exactly like I would expect by API Worker starvation.
> >> >
> >> >
> >> > Not sure I agree on this one, this has been observed multiple times in
> >> > the gate already [1] (though I am not sure there's a bug for it), and
> I
> >> > don't believe it has anything to do with the number of API workers,
> >> > unless not even two workers are enough.
> >>
> >> There is no guarntee that 2 workers are enough. I'm not surprised if we
> >> see that failure some today. This was all guess work on trimming worker
> >> counts to deal with the memory issue in the past. But we're running
> >> tests in parallel, and the services are making calls back to other
> >> services all the time.
> >>
> >> This is one of the reasons to get the wsgi stack off of eventlet and
> >> into a real webserver, as they handle HTTP request backups much much
> >> better.
> >>
> >> I do understand that people want a quick fix here, but I'm not convinced
> >> that it exists.
> >>
> >>         -Sean
> >>
> >> --
> >> Sean Dague
> >> http://dague.net
> >>
> >> ____________________________________________________________
> ______________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
> unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > ____________________________________________________________
> ______________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170202/7626ffac/attachment.html>


More information about the OpenStack-dev mailing list