[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

Chris Friesen chris.friesen at windriver.com
Mon Feb 22 16:30:50 UTC 2016


On 02/22/2016 11:17 AM, Jay Pipes wrote:
> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>> Hi all,
>>
>> We've recently run into some interesting behaviour that I thought I
>> should bring up to see if we want to do anything about it.
>>
>> Basically the problem seems to be that nova-compute is doing disk I/O
>> from the main thread, and if it blocks then it can block all of
>> nova-compute (since all eventlets will be blocked).  Examples that we've
>> found include glance image download, file renaming, instance directory
>> creation, opening the instance xml file, etc.  We've seen nova-compute
>> block for upwards of 50 seconds.
>>
>> Now the specific case where we hit this is not a production
>> environment.  It's only got one spinning disk shared by all the guests,
>> the guests were hammering on the disk pretty hard, the IO scheduler for
>> the instance disk was CFQ which seems to be buggy in our kernel.
>>
>> But the fact remains that nova-compute is doing disk I/O from the main
>> thread, and if the guests push that disk hard enough then nova-compute
>> is going to suffer.
>>
>> Given the above...would it make sense to use eventlet.tpool or similar
>> to perform all disk access in a separate OS thread?  There'd likely be a
>> bit of a performance hit, but at least it would isolate the main thread
>> from IO blocking.
>
> This is probably a good idea, but will require quite a bit of code change. I
> think in the past we've taken the expedient route of just exec'ing problematic
> code in a greenthread using utils.spawn().

I'm not an expert on eventlet, but from what I've seen this isn't sufficient to 
deal with disk access in a robust way.

It's my understanding that utils.spawn() will result in the code running in the 
same OS thread, but in a separate eventlet greenthread.  If that code tries to 
access the disk via a potentially-blocking call the eventlet subsystem will not 
jump to another greenthread.  Because of this it can potentially block the whole 
OS thread (and thus all other greenthreads running in that OS thread).

I think we need to eventlet.tpool for disk IO (or else fork a whole separate 
process).  Basically we need to ensure that the main OS thread never issues a 
potentially-blocking syscall.

Chris



More information about the OpenStack-dev mailing list