[openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO
andrew at lascii.com
Mon Feb 22 17:27:54 UTC 2016
On Mon, Feb 22, 2016, at 12:15 PM, Mike Bayer wrote:
> On 02/22/2016 11:30 AM, Chris Friesen wrote:
> > On 02/22/2016 11:17 AM, Jay Pipes wrote:
> >> On 02/22/2016 10:43 AM, Chris Friesen wrote:
> >>> Hi all,
> >>> We've recently run into some interesting behaviour that I thought I
> >>> should bring up to see if we want to do anything about it.
> >>> Basically the problem seems to be that nova-compute is doing disk I/O
> >>> from the main thread, and if it blocks then it can block all of
> >>> nova-compute (since all eventlets will be blocked). Examples that we've
> >>> found include glance image download, file renaming, instance directory
> >>> creation, opening the instance xml file, etc. We've seen nova-compute
> >>> block for upwards of 50 seconds.
> >>> Now the specific case where we hit this is not a production
> >>> environment. It's only got one spinning disk shared by all the guests,
> >>> the guests were hammering on the disk pretty hard, the IO scheduler for
> >>> the instance disk was CFQ which seems to be buggy in our kernel.
> >>> But the fact remains that nova-compute is doing disk I/O from the main
> >>> thread, and if the guests push that disk hard enough then nova-compute
> >>> is going to suffer.
> >>> Given the above...would it make sense to use eventlet.tpool or similar
> >>> to perform all disk access in a separate OS thread? There'd likely be a
> >>> bit of a performance hit, but at least it would isolate the main thread
> >>> from IO blocking.
> >> This is probably a good idea, but will require quite a bit of code
> >> change. I
> >> think in the past we've taken the expedient route of just exec'ing
> >> problematic
> >> code in a greenthread using utils.spawn().
> > I'm not an expert on eventlet, but from what I've seen this isn't
> > sufficient to deal with disk access in a robust way.
> > It's my understanding that utils.spawn() will result in the code running
> > in the same OS thread, but in a separate eventlet greenthread. If that
> > code tries to access the disk via a potentially-blocking call the
> > eventlet subsystem will not jump to another greenthread. Because of
> > this it can potentially block the whole OS thread (and thus all other
> > greenthreads running in that OS thread).
> not sure what utils.spawn() does but if it is in fact an "exec" (or if
> Jay is suggesting that an exec() be used within) then the code would be
> in a different process entirely, and communicating with it becomes an
> issue of pipe IO over unix sockets which IIRC can do non blocking.
utils.spawn() is just a wrapper around eventlet.spawn(), mostly there to
be stubbed out in testing.
> > I think we need to eventlet.tpool for disk IO (or else fork a whole
> > separate process). Basically we need to ensure that the main OS thread
> > never issues a potentially-blocking syscall.
> tpool would probably be easier (and more performant because no socket
> > Chris
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> OpenStack Development Mailing List (not for usage questions)
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev