[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
sean at dague.net
Mon Feb 22 17:07:37 UTC 2016
On 02/22/2016 10:43 AM, Chris Friesen wrote:
> Hi all,
> We've recently run into some interesting behaviour that I thought I
> should bring up to see if we want to do anything about it.
> Basically the problem seems to be that nova-compute is doing disk I/O
> from the main thread, and if it blocks then it can block all of
> nova-compute (since all eventlets will be blocked). Examples that we've
> found include glance image download, file renaming, instance directory
> creation, opening the instance xml file, etc. We've seen nova-compute
> block for upwards of 50 seconds.
> Now the specific case where we hit this is not a production
> environment. It's only got one spinning disk shared by all the guests,
> the guests were hammering on the disk pretty hard, the IO scheduler for
> the instance disk was CFQ which seems to be buggy in our kernel.
> But the fact remains that nova-compute is doing disk I/O from the main
> thread, and if the guests push that disk hard enough then nova-compute
> is going to suffer.
> Given the above...would it make sense to use eventlet.tpool or similar
> to perform all disk access in a separate OS thread? There'd likely be a
> bit of a performance hit, but at least it would isolate the main thread
> from IO blocking.
Making nova-compute more robust is fine, though the reality is once you
IO starve a system, a lot of stuff is going to fall over weird.
So there has to be a tradeoff of the complexity of any new code vs. what
it gains. I think individual patches should be evaluated as such, or a
spec if this is going to get really invasive.
More information about the OpenStack-dev