[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

Sam Matzek matzeksam at gmail.com
Thu Feb 25 12:24:29 UTC 2016

For what it's worth Glance API also has to deal with file I/O blocking
all greenthreads and has CooperativeReaders/Writers that yield around
the file I/O to mitigate starvation.  A while ago I hit an issue with
5 concurrent VM snapshots starving Nova compute eventlets due to the
excessive file IO of reading the snapshot file to upload.  This could
be remedied by taking the cooperative reader from Glance API and using
it in glanceclient [1].  It's not perfect but something similar could
help out the glance image download issues without needing to tweak the

[1] https://bugs.launchpad.net/python-glanceclient/+bug/1327248

On Mon, Feb 22, 2016 at 1:13 PM, Chris Friesen
<chris.friesen at windriver.com> wrote:
> On 02/22/2016 11:20 AM, Daniel P. Berrange wrote:
>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
>>>> But the fact remains that nova-compute is doing disk I/O from the main
>>>> thread, and if the guests push that disk hard enough then nova-compute
>>>> is going to suffer.
>>>> Given the above...would it make sense to use eventlet.tpool or similar
>>>> to perform all disk access in a separate OS thread?  There'd likely be a
>>>> bit of a performance hit, but at least it would isolate the main thread
>>>> from IO blocking.
>>> Making nova-compute more robust is fine, though the reality is once you
>>> IO starve a system, a lot of stuff is going to fall over weird.
>>> So there has to be a tradeoff of the complexity of any new code vs. what
>>> it gains. I think individual patches should be evaluated as such, or a
>>> spec if this is going to get really invasive.
>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>> I/O priorization that you could use to give Nova higher priority over
>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>> can inflict a denial of service on the mgmt layer.  Of course figuring
>> out how to use that mechanism correctly is not entirely trivial.
> The 50+ second delays were with CFQ as the disk scheduler.  (No cgroups
> though, just CFQ with equal priorities on nova-compute and the guests.)
> This was with a 3.10 kernel though, so maybe CFQ behaves better on newer
> kernels.
> If you put nova-compute at high priority then glance image downloads,
> qemu-img format conversions, and volume clearing will also run at the higher
> priority, potentially impacting running VMs.
> In an ideal world we'd have per-VM cgroups and all activity on behalf of a
> particular VM would be done in the context of that VM's cgroup.
> Chris
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list