[openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers
Chris Friesen
chris.friesen at windriver.com
Fri Jan 8 10:43:58 UTC 2016
On 01/07/2016 06:55 PM, Mike Bayer wrote:
>
>
> On 01/07/2016 11:02 AM, Sean Dague wrote:
>> On 01/07/2016 09:56 AM, Brant Knudson wrote:
>>>
>>>
>>> On Thu, Jan 7, 2016 at 6:39 AM, Clayton O'Neill <clayton at oneill.net
>>> <mailto:clayton at oneill.net>> wrote:
>>>
>>> On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka
>>> <rpodolyaka at mirantis.com <mailto:rpodolyaka at mirantis.com>> wrote:
>>> >
>>> > Linux gurus please correct me here, but my understanding is that Linux
>>> > kernel queues up to $backlog number of connections *per socket*. In
>>> > our case child processes inherited the FD of the socket, so they will
>>> > accept() connections from the same queue in the kernel, i.e. the
>>> > backlog value is for *all* child processes, not *per* process.
>>>
>>>
>>> Yes, it will be shared across all children.
>>>
>>> >
>>> > In each child process eventlet WSGI server calls accept() in a loop to
>>> > get a client socket from the kernel and then puts into a greenlet from
>>> > a pool for processing:
>>>
>>> It’s worse than that. What I’ve seen (via strace) is that eventlet
>>> actually
>>> converts socket into a non-blocking socket, then converts that
>>> accept() into a
>>> epoll()/accept() pair in every child. Then when a connection comes
>>> in, every
>>> child process wakes up out of poll and races to try to accept on the the
>>> non-blocking socket, and all but one of them fails.
>>>
>>> This means that every time there is a request, every child process
>>> is woken
>>> up, scheduled on CPU and then put back to sleep. This is one of the
>>> reasons we’re (slowly) moving to uWSGI.
>>>
>>>
>>> I just want to note that I've got a change proposed to devstack that
>>> adds a config option to run keystone in uwsgi (rather than under
>>> eventlet or in apache httpd mod_wsgi), see
>>> https://review.openstack.org/#/c/257571/ . It's specific to keystone
>>> since I didn't think other projects were moving away from eventlet, too.
>>
>> I feel like this is a confused point that keeps being brought up.
>>
>> The preferred long term direction of all API services is to be deployed
>> on a real web server platform. It's a natural fit for those services as
>> they are accepting HTTP requests and doing things with them.
>>
>> Most OpenStack projects have worker services beyond just an HTTP server.
>> (Keystone is one of the very few exceptions here). Nova has nearly a
>> dozen of these worker services. These don't naturally fit as wsgi apps,
>> they are more traditional daemons, which accept requests over the
>> network, but also have periodic jobs internally and self initiate
>> actions. They are not just call / response. There is no long term
>> direction for these to move off of eventlet.
>
> This is totally speaking as an outsider without taking into account all
> the history of these decisions, but the notion of "Python + we're a
> daemon" == "we must use eventlet" seems a little bit rigid. Also, the
> notion of "we have background tasks" == "we cant run in a web server",
> also not clear. If a service intends to serve HTTP requests, that
> portion of that service should be deployed in a web server; if the
> system has other "background tasks", ideally those are in a separate
> daemon altogether, but also even if you're under something like
> mod_wsgi, you can spawn a child process or worker thread regardless.
> You always have a Python interpreter running and all the things it can do.
In the case of nova at least most (all?) of these separate worker services do
not process HTTP requests, but rather RPC requests.
It might make sense for nova-api to run under a web server even if
nova-compute/nova-conductor/nova-scheduler/etc don't.
Chris
More information about the OpenStack-dev
mailing list