[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
mbayer at redhat.com
Mon Feb 22 17:15:07 UTC 2016
On 02/22/2016 11:30 AM, Chris Friesen wrote:
> On 02/22/2016 11:17 AM, Jay Pipes wrote:
>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>>> Hi all,
>>> We've recently run into some interesting behaviour that I thought I
>>> should bring up to see if we want to do anything about it.
>>> Basically the problem seems to be that nova-compute is doing disk I/O
>>> from the main thread, and if it blocks then it can block all of
>>> nova-compute (since all eventlets will be blocked). Examples that we've
>>> found include glance image download, file renaming, instance directory
>>> creation, opening the instance xml file, etc. We've seen nova-compute
>>> block for upwards of 50 seconds.
>>> Now the specific case where we hit this is not a production
>>> environment. It's only got one spinning disk shared by all the guests,
>>> the guests were hammering on the disk pretty hard, the IO scheduler for
>>> the instance disk was CFQ which seems to be buggy in our kernel.
>>> But the fact remains that nova-compute is doing disk I/O from the main
>>> thread, and if the guests push that disk hard enough then nova-compute
>>> is going to suffer.
>>> Given the above...would it make sense to use eventlet.tpool or similar
>>> to perform all disk access in a separate OS thread? There'd likely be a
>>> bit of a performance hit, but at least it would isolate the main thread
>>> from IO blocking.
>> This is probably a good idea, but will require quite a bit of code
>> change. I
>> think in the past we've taken the expedient route of just exec'ing
>> code in a greenthread using utils.spawn().
> I'm not an expert on eventlet, but from what I've seen this isn't
> sufficient to deal with disk access in a robust way.
> It's my understanding that utils.spawn() will result in the code running
> in the same OS thread, but in a separate eventlet greenthread. If that
> code tries to access the disk via a potentially-blocking call the
> eventlet subsystem will not jump to another greenthread. Because of
> this it can potentially block the whole OS thread (and thus all other
> greenthreads running in that OS thread).
not sure what utils.spawn() does but if it is in fact an "exec" (or if
Jay is suggesting that an exec() be used within) then the code would be
in a different process entirely, and communicating with it becomes an
issue of pipe IO over unix sockets which IIRC can do non blocking.
> I think we need to eventlet.tpool for disk IO (or else fork a whole
> separate process). Basically we need to ensure that the main OS thread
> never issues a potentially-blocking syscall.
tpool would probably be easier (and more performant because no socket
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev