[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

Tim Bell Tim.Bell at cern.ch
Mon Feb 22 18:38:15 UTC 2016

On 22/02/16 19:07, "John Garbutt" <john at johngarbutt.com> wrote:

>On 22 February 2016 at 17:38, Sean Dague <sean at dague.net> wrote:
>> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
>>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>>>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>>>>> Hi all,
>>>>> We've recently run into some interesting behaviour that I thought I
>>>>> should bring up to see if we want to do anything about it.
>>>>> Basically the problem seems to be that nova-compute is doing disk I/O
>>>>> from the main thread, and if it blocks then it can block all of
>>>>> nova-compute (since all eventlets will be blocked).  Examples that we've
>>>>> found include glance image download, file renaming, instance directory
>>>>> creation, opening the instance xml file, etc.  We've seen nova-compute
>>>>> block for upwards of 50 seconds.
>>>>> Now the specific case where we hit this is not a production
>>>>> environment.  It's only got one spinning disk shared by all the guests,
>>>>> the guests were hammering on the disk pretty hard, the IO scheduler for
>>>>> the instance disk was CFQ which seems to be buggy in our kernel.
>>>>> But the fact remains that nova-compute is doing disk I/O from the main
>>>>> thread, and if the guests push that disk hard enough then nova-compute
>>>>> is going to suffer.
>>>>> Given the above...would it make sense to use eventlet.tpool or similar
>>>>> to perform all disk access in a separate OS thread?  There'd likely be a
>>>>> bit of a performance hit, but at least it would isolate the main thread
>>>>> from IO blocking.
>>>> Making nova-compute more robust is fine, though the reality is once you
>>>> IO starve a system, a lot of stuff is going to fall over weird.
>>>> So there has to be a tradeoff of the complexity of any new code vs. what
>>>> it gains. I think individual patches should be evaluated as such, or a
>>>> spec if this is going to get really invasive.
>>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>>> I/O priorization that you could use to give Nova higher priority over
>>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>>> can inflict a denial of service on the mgmt layer.  Of course figuring
>>> out how to use that mechanism correctly is not entirely trivial.
>>> I think it is probably worth focusing effort in that area, before jumping
>>> into making all the I/O related code in Nova more complicated. eg have
>>> someone investigate & write up recommendation in Nova docs for how to
>>> configure the host OS & Nova such that VMs cannot inflict an I/O denial
>>> of service attack on the mgmt service.
>> +1 that would be much nicer.
>> We've got some set of bugs in the tracker right now which are basically
>> "after the compute node being at loadavg of 11 for an hour, nova-compute
>> starts failing". Having some basic methodology to use Linux
>> prioritization on the worker process would mitigate those quite a bit,
>> and could be used by all users immediately, vs. complex nova-compute
>> changes which would only apply to new / upgraded deploys.
>Does that turn into improved deployment docs that cover how you do
>that on various platforms?
>Maybe some tools to help with that also go in here?

And some easy configuration in the puppet/ansible/chef standard recipes would also help.

>FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit
>in a more heavy handed way.
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2792 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160222/4d922279/attachment.bin>

More information about the OpenStack-dev mailing list