[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

Sean Dague sean at dague.net
Mon Feb 22 17:38:21 UTC 2016

On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>>> Hi all,
>>> We've recently run into some interesting behaviour that I thought I
>>> should bring up to see if we want to do anything about it.
>>> Basically the problem seems to be that nova-compute is doing disk I/O
>>> from the main thread, and if it blocks then it can block all of
>>> nova-compute (since all eventlets will be blocked).  Examples that we've
>>> found include glance image download, file renaming, instance directory
>>> creation, opening the instance xml file, etc.  We've seen nova-compute
>>> block for upwards of 50 seconds.
>>> Now the specific case where we hit this is not a production
>>> environment.  It's only got one spinning disk shared by all the guests,
>>> the guests were hammering on the disk pretty hard, the IO scheduler for
>>> the instance disk was CFQ which seems to be buggy in our kernel.
>>> But the fact remains that nova-compute is doing disk I/O from the main
>>> thread, and if the guests push that disk hard enough then nova-compute
>>> is going to suffer.
>>> Given the above...would it make sense to use eventlet.tpool or similar
>>> to perform all disk access in a separate OS thread?  There'd likely be a
>>> bit of a performance hit, but at least it would isolate the main thread
>>> from IO blocking.
>> Making nova-compute more robust is fine, though the reality is once you
>> IO starve a system, a lot of stuff is going to fall over weird.
>> So there has to be a tradeoff of the complexity of any new code vs. what
>> it gains. I think individual patches should be evaluated as such, or a
>> spec if this is going to get really invasive.
> There are OS level mechanisms (eg cgroups blkio controller) for doing
> I/O priorization that you could use to give Nova higher priority over
> the VMs, to reduce (if not eliminate) the possibility that a busy VM
> can inflict a denial of service on the mgmt layer.  Of course figuring
> out how to use that mechanism correctly is not entirely trivial.
> I think it is probably worth focusing effort in that area, before jumping
> into making all the I/O related code in Nova more complicated. eg have
> someone investigate & write up recommendation in Nova docs for how to
> configure the host OS & Nova such that VMs cannot inflict an I/O denial
> of service attack on the mgmt service.

+1 that would be much nicer.

We've got some set of bugs in the tracker right now which are basically
"after the compute node being at loadavg of 11 for an hour, nova-compute
starts failing". Having some basic methodology to use Linux
prioritization on the worker process would mitigate those quite a bit,
and could be used by all users immediately, vs. complex nova-compute
changes which would only apply to new / upgraded deploys.


