<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 2 February 2017 at 13:34, Ihar Hrachyshka <span dir="ltr"><<a href="mailto:ihrachys@redhat.com" target="_blank">ihrachys@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The BadStatusLine error is well known:<br>

<a href="https://bugs.launchpad.net/nova/+bug/1630664" rel="noreferrer" target="_blank">https://bugs.launchpad.net/<wbr>nova/+bug/1630664</a></blockquote><div><br></div><div>That's the one! I knew it I had seen it in the past!</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

Now, it doesn't mean that the root cause of the error message is the<br>

same, and it may as well be that lowering the number of workers<br>

triggered it. All I am saying is we saw that error in the past.<br>

<span class="HOEnZb"><font color="#888888"><br>

Ihar<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Thu, Feb 2, 2017 at 1:07 PM, Kevin Benton <kevin@benton.pub> wrote:<br>

> This error seems to be new in the ocata cycle. It's either related to a<br>

> dependency change or the fact that we put Apache in between the services<br>

> now. Handling more concurrent requests than workers wasn't an issue before.<br>

><br>

> It seems that you are suggesting that eventlet can't handle concurrent<br>

> connections, which is the entire purpose of the library, no?<br>

><br>

> On Feb 2, 2017 13:53, "Sean Dague" <<a href="mailto:sean@dague.net">sean@dague.net</a>> wrote:<br>

>><br>

>> On 02/02/2017 03:32 PM, Armando M. wrote:<br>

>> ><br>

>> ><br>

>> > On 2 February 2017 at 12:19, Sean Dague <<a href="mailto:sean@dague.net">sean@dague.net</a><br>

>> > <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>>> wrote:<br>

>> ><br>

>> >     On 02/02/2017 02:28 PM, Armando M. wrote:<br>

>> >     ><br>

>> >     ><br>

>> >     > On 2 February 2017 at 10:08, Sean Dague <<a href="mailto:sean@dague.net">sean@dague.net</a><br>

>> > <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>><br>

>> >     > <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a> <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>>>> wrote:<br>

>> >     ><br>

>> >     >     On 02/02/2017 12:49 PM, Armando M. wrote:<br>

>> >     >     ><br>

>> >     >     ><br>

>> >     >     > On 2 February 2017 at 08:40, Sean Dague <<a href="mailto:sean@dague.net">sean@dague.net</a><br>

>> > <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>> <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a><br>

>> >     <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>>><br>

>> >     >     > <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a> <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>><br>

>> >     <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a> <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>>>>> wrote:<br>

>> >     >     ><br>

>> >     >     >     On 02/02/2017 11:16 AM, Matthew Treinish wrote:<br>

>> >     >     >     <snip><br>

>> >     >     >     > <oops, forgot to finish my though><br>

>> >     >     >     ><br>

>> >     >     >     > We definitely aren't saying running a single worker is<br>

>> > how<br>

>> >     >     we recommend people<br>

>> >     >     >     > run OpenStack by doing this. But it just adds on to<br>

>> > the<br>

>> >     >     differences between the<br>

>> >     >     >     > gate and what we expect things actually look like.<br>

>> >     >     ><br>

>> >     >     >     I'm all for actually getting to the bottom of this, but<br>

>> >     >     honestly real<br>

>> >     >     >     memory profiling is needed here. The growth across<br>

>> > projects<br>

>> >     >     probably<br>

>> >     >     >     means that some common libraries are some part of this.<br>

>> > The<br>

>> >     >     ever growing<br>

>> >     >     >     requirements list is demonstrative of that. Code reuse<br>

>> > is<br>

>> >     >     good, but if<br>

>> >     >     >     we are importing much of a library to get access to a<br>

>> >     couple of<br>

>> >     >     >     functions, we're going to take a bunch of memory weight<br>

>> >     on that<br>

>> >     >     >     (especially if that library has friendly auto imports in<br>

>> >     top level<br>

>> >     >     >     __init__.py so we can't get only the parts we want).<br>

>> >     >     ><br>

>> >     >     >     Changing the worker count is just shuffling around deck<br>

>> >     chairs.<br>

>> >     >     ><br>

>> >     >     >     I'm not familiar enough with memory profiling tools in<br>

>> >     python<br>

>> >     >     to know<br>

>> >     >     >     the right approach we should take there to get this down<br>

>> > to<br>

>> >     >     individual<br>

>> >     >     >     libraries / objects that are containing all our memory.<br>

>> >     Anyone<br>

>> >     >     more<br>

>> >     >     >     skilled here able to help lead the way?<br>

>> >     >     ><br>

>> >     >     ><br>

>> >     >     > From what I hear, the overall consensus on this matter is to<br>

>> >     determine<br>

>> >     >     > what actually caused the memory consumption bump and how to<br>

>> >     >     address it,<br>

>> >     >     > but that's more of a medium to long term action. In fact, to<br>

>> > me<br>

>> >     >     this is<br>

>> >     >     > one of the top priority matters we should talk about at the<br>

>> >     >     imminent PTG.<br>

>> >     >     ><br>

>> >     >     > For the time being, and to provide relief to the gate,<br>

>> > should we<br>

>> >     >     want to<br>

>> >     >     > lock the API_WORKERS to 1? I'll post something for review<br>

>> >     and see how<br>

>> >     >     > many people shoot it down :)<br>

>> >     ><br>

>> >     >     I don't think we want to do that. It's going to force down the<br>

>> >     eventlet<br>

>> >     >     API workers to being a single process, and it's not super<br>

>> >     clear that<br>

>> >     >     eventlet handles backups on the inbound socket well. I<br>

>> >     honestly would<br>

>> >     >     expect that creates different hard to debug issues, especially<br>

>> >     with high<br>

>> >     >     chatter rates between services.<br>

>> >     ><br>

>> >     ><br>

>> >     > I must admit I share your fear, but out of the tests that I have<br>

>> >     > executed so far in [1,2,3], the house didn't burn in a fire. I am<br>

>> >     > looking for other ways to have a substantial memory saving with a<br>

>> >     > relatively quick and dirty fix, but coming up empty handed thus<br>

>> > far.<br>

>> >     ><br>

>> >     > [1] <a href="https://review.openstack.org/#/c/428303/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/428303/</a><br>

>> >     <<a href="https://review.openstack.org/#/c/428303/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>#/c/428303/</a>><br>

>> >     > [2] <a href="https://review.openstack.org/#/c/427919/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/427919/</a><br>

>> >     <<a href="https://review.openstack.org/#/c/427919/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>#/c/427919/</a>><br>

>> >     > [3] <a href="https://review.openstack.org/#/c/427921/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/427921/</a><br>

>> >     <<a href="https://review.openstack.org/#/c/427921/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>#/c/427921/</a>><br>

>> ><br>

>> >     This failure in the first patch -<br>

>> ><br>

>> > <a href="http://logs.openstack.org/03/428303/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751" rel="noreferrer" target="_blank">http://logs.openstack.org/03/<wbr>428303/1/check/gate-tempest-<wbr>dsvm-neutron-full-ubuntu-<wbr>xenial/71f42ea/logs/screen-n-<wbr>api.txt.gz?level=TRACE#_2017-<wbr>02-02_19_14_11_751</a><br>

>> ><br>

>> > <<a href="http://logs.openstack.org/03/428303/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/71f42ea/logs/screen-n-api.txt.gz?level=TRACE#_2017-02-02_19_14_11_751" rel="noreferrer" target="_blank">http://logs.openstack.org/03/<wbr>428303/1/check/gate-tempest-<wbr>dsvm-neutron-full-ubuntu-<wbr>xenial/71f42ea/logs/screen-n-<wbr>api.txt.gz?level=TRACE#_2017-<wbr>02-02_19_14_11_751</a>><br>

>> >     looks exactly like I would expect by API Worker starvation.<br>

>> ><br>

>> ><br>

>> > Not sure I agree on this one, this has been observed multiple times in<br>

>> > the gate already [1] (though I am not sure there's a bug for it), and I<br>

>> > don't believe it has anything to do with the number of API workers,<br>

>> > unless not even two workers are enough.<br>

>><br>

>> There is no guarntee that 2 workers are enough. I'm not surprised if we<br>

>> see that failure some today. This was all guess work on trimming worker<br>

>> counts to deal with the memory issue in the past. But we're running<br>

>> tests in parallel, and the services are making calls back to other<br>

>> services all the time.<br>

>><br>

>> This is one of the reasons to get the wsgi stack off of eventlet and<br>

>> into a real webserver, as they handle HTTP request backups much much<br>

>> better.<br>

>><br>

>> I do understand that people want a quick fix here, but I'm not convinced<br>

>> that it exists.<br>

>><br>

>>         -Sean<br>

>><br>

>> --<br>

>> Sean Dague<br>

>> <a href="http://dague.net" rel="noreferrer" target="_blank">http://dague.net</a><br>

>><br>

>> ______________________________<wbr>______________________________<wbr>______________<br>

>> OpenStack Development Mailing List (not for usage questions)<br>

>> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

><br>

><br>

> ______________________________<wbr>______________________________<wbr>______________<br>

> OpenStack Development Mailing List (not for usage questions)<br>

> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

><br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>