[openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers
rpodolyaka at mirantis.com
Thu Jan 7 07:49:26 UTC 2016
Actually we already do that in the parent process. The parent process:
1) starts and creates a socket
2) binds the socket and calls listen() on it passing the backlog value
3) passes the socket to the eventlet WSGI server
4) forks $*_workers times (child processes inherit all open file
descriptors including the socket one)
5) child processes call accept() in a loop
Linux gurus please correct me here, but my understanding is that Linux
kernel queues up to $backlog number of connections *per socket*. In
our case child processes inherited the FD of the socket, so they will
accept() connections from the same queue in the kernel, i.e. the
backlog value is for *all* child processes, not *per* process.
>>> E.g. all workers are saturated, it will place a waiting connection onto a random greenlet which then has to wait?
In each child process eventlet WSGI server calls accept() in a loop to
get a client socket from the kernel and then puts into a greenlet from
a pool for processing:
The "saturation" point for a child process in our case will be when we
run out of available greenlets in the pool, so that pool.spawn_n()
call will block and it won't call accept() anymore, until one or more
greenlets finishes processing of previous requests.
Or, a particular greenlet can do a blocking call, which won't yield
the execution context back to the event loop, so that eventlet WSGI
server green thread won't get a chance to be executed and call
accept() (e.g. a call to MySQL-Python without tpool).
The kernel will queue up to $backlog connections for us until we call
accept() in one of the child processes.
On Thu, Jan 7, 2016 at 12:02 AM, Mike Bayer <mbayer at redhat.com> wrote:
> On 01/06/2016 09:11 AM, Roman Podoliaka wrote:
>> Hi Mike,
>> Thank you for this brilliant analysis! We've been seeing such timeout
>> errors in downstream periodically and this is the first time someone
>> has analysed the root cause thoroughly.
>> On Fri, Dec 18, 2015 at 10:33 PM, Mike Bayer <mbayer at redhat.com> wrote:
>>> But if we only have a super low number of greenlets and only a few dozen
>>> workers, what happens if we have more than 240 requests come in at once,
>>> aren't those connections going to get rejected? No way! eventlet's
>>> networking system is better than that, those connection requests just
>>> get queued up in any case, waiting for a greenlet to be available. Play
>>> with the script and its settings to see.
>> Right, it must be controlled by the backlog argument value here:
> oh wow, totally missed that! But, how does backlog here interact with
> multiple processes? E.g. all workers are saturated, it will place a
> waiting connection onto a random greenlet which then has to wait? It
> would be better if the "backlog" were pushed up to the parent process,
> not sure if that's possible?
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev