[Openstack] eventlet weirdness

Adam Young ayoung at redhat.com
Tue Mar 6 01:30:15 UTC 2012


On 03/05/2012 05:08 PM, Yun Mao wrote:
> Hi Phil,
>
> My understanding is that, (forget Nova for a second) in a perfect
> eventlet world, a green thread is either doing CPU intensive
> computing, or wait in system calls that are IO related. In the latter
> case, the eventlet scheduler will suspend the green thread and switch
> to another green thread that is ready to run.
>
> Back to reality, as you mentioned this is broken - some IO bound
> activity won't cause an eventlet switch. To me the only possibility
> that happens is the same reason those MySQL calls are blocking - we
> are using C-based modules that don't respect monkey patch and never
> yield. I'm suspecting that all libvirt based calls also belong to this
> category.

Agree.  I expect that to be the case of any native library.  Monkey 
patching only changes the Python side of the call,  anything in native 
code is too far along for it to be redirected.
>
> Now if those blocking calls can finish in a very short of time (as we
> assume for DB calls), then I think inserting a sleep(0) after every
> blocking call should be a quick fix to the problem.
Nope.  The blocking call still blocks,  then it returns,  hits the 
sleep, and is scheduled.  The only option is to wrap it with a thread pool.

 From an OS perspective,  there are no such things as greenthreads.  The 
same task_struct in the Linux Kernel (representing a Posix thread) that 
manages the body of the web application is used to process the IO.  The 
Linux thread  goes into a sleep state  until the IO comes back,  and the 
Kernel scheduler will schedule another OS process or task.  In order to 
get both the IO to complete and the  greenthread scheudler to process 
another greenthread,  you need to have two Posix threads.

If the libvirt API (or other Native API) has an async mode,  what you 
can do is provide a synchronos,  python based wrapper that does the 
following.

register_request callback()
async_call()
sleep()

The only time sleep() as called from Python code is going to help you is 
if you have a long running stretch of Python code, and you sleep()  in 
the middle of it.




> But if it's a long
> blocking call like the snapshot case, we are probably screwed anyway
> and need OS thread level parallelism or multiprocessing to make it
> truly non-blocking.. Thanks,

Yep.
>
> Yun
>
> On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil<philip.day at hp.com>  wrote:
>> Hi Yun,
>>
>> The point of the sleep(0) is to explicitly yield from a long running eventlet to so that other eventlets aren't blocked for a long period.   Depending on how you look at that either means we're making an explicit judgement on priority, or trying to provide a more equal sharing of run-time across eventlets.
>>
>> It's not that things are CPU bound as such - more just that eventlets have every few pre-emption points.    Even an IO bound activity like creating a snapshot won't cause an eventlet switch.
>>
>> So in terms of priority we're trying to get to the state where:
>>   - Important periodic events (such as service status) run when expected  (if these take a long time we're stuffed anyway)
>>   - User initiated actions don't get blocked by background system eventlets (such as refreshing power-state)
>> - Slow action from one user don't block actions from other users (the first user will expect their snapshot to take X seconds, the second one won't expect their VM creation to take X + Y seconds).
>>
>> It almost feels like the right level of concurrency would be to have a task/process running for each VM, so that there is concurrency across un-related VMs, but serialisation for each VM.
>>
>> Phil
>>
>> -----Original Message-----
>> From: Yun Mao [mailto:yunmao at gmail.com]
>> Sent: 02 March 2012 20:32
>> To: Day, Phil
>> Cc: Chris Behrens; Joshua Harlow; openstack
>> Subject: Re: [Openstack] eventlet weirdness
>>
>> Hi Phil, I'm a little confused. To what extend does sleep(0) help?
>>
>> It only gives the greenlet scheduler a chance to switch to another green thread. If we are having a CPU bound issue, sleep(0) won't give us access to any more CPU cores. So the total time to finish should be the same no matter what. It may improve the fairness among different green threads but shouldn't help the throughput. I think the only apparent gain to me is situation such that there is 1 green thread with long CPU time and many other green threads with small CPU time.
>> The total finish time will be the same with or without sleep(0), but with sleep in the first threads, the others should be much more responsive.
>>
>> However, it's unclear to me which part of Nova is very CPU intensive.
>> It seems that most work here is IO bound, including the snapshot. Do we have other blocking calls besides mysql access? I feel like I'm missing something but couldn't figure out what.
>>
>> Thanks,
>>
>> Yun
>>
>>
>> On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil<philip.day at hp.com>  wrote:
>>> I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state).
>>>
>>> However ...   In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   And it works now without a major code refactor.
>>>
>>> Always open to other approaches ...
>>>
>>> Phil
>>>
>>>
>>> -----Original Message-----
>>> From: openstack-bounces+philip.day=hp.com at lists.launchpad.net
>>> [mailto:openstack-bounces+philip.day=hp.com at lists.launchpad.net] On
>>> Behalf Of Chris Behrens
>>> Sent: 02 March 2012 19:00
>>> To: Joshua Harlow
>>> Cc: openstack; Chris Behrens
>>> Subject: Re: [Openstack] eventlet weirdness
>>>
>>> It's not just you
>>>
>>>
>>> On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:
>>>
>>>> Does anyone else feel that the following seems really "dirty", or is it just me.
>>>>
>>>> "adding a few sleep(0) calls in various places in the Nova codebase
>>>> (as was recently added in the _sync_power_states() periodic task) is
>>>> an easy and simple win with pretty much no ill side-effects. :)"
>>>>
>>>> Dirty in that it feels like there is something wrong from a design point of view.
>>>> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
>>>> But that's just my gut feeling.
>>>>
>>>> :-(
>>>>
>>>> On 3/2/12 8:26 AM, "Armando Migliaccio"<Armando.Migliaccio at eu.citrix.com>  wrote:
>>>>
>>>> I knew you'd say that :P
>>>>
>>>> There you go: https://bugs.launchpad.net/nova/+bug/944145
>>>>
>>>> Cheers,
>>>> Armando
>>>>
>>>>> -----Original Message-----
>>>>> From: Jay Pipes [mailto:jaypipes at gmail.com]
>>>>> Sent: 02 March 2012 16:22
>>>>> To: Armando Migliaccio
>>>>> Cc: openstack at lists.launchpad.net
>>>>> Subject: Re: [Openstack] eventlet weirdness
>>>>>
>>>>> On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
>>>>>> I'd be cautious to say that no ill side-effects were introduced.
>>>>>> I found a
>>>>> race condition right in the middle of sync_power_states, which I
>>>>> assume was exposed by "breaking" the task deliberately.
>>>>>
>>>>> Such a party-pooper! ;)
>>>>>
>>>>> Got a link to the bug report for me?
>>>>>
>>>>> Thanks!
>>>>> -jay
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~openstack Post to     :
>>>> openstack at lists.launchpad.net Unsubscribe :
>>>> https://launchpad.net/~openstack More help   :
>>>> https://help.launchpad.net/ListHelp
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~openstack Post to     :
>>>> openstack at lists.launchpad.net Unsubscribe :
>>>> https://launchpad.net/~openstack More help   :
>>>> https://help.launchpad.net/ListHelp
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack Post to     :
>>> openstack at lists.launchpad.net Unsubscribe :
>>> https://launchpad.net/~openstack More help   :
>>> https://help.launchpad.net/ListHelp
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack Post to     :
>>> openstack at lists.launchpad.net Unsubscribe :
>>> https://launchpad.net/~openstack More help   :
>>> https://help.launchpad.net/ListHelp
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp





More information about the Openstack mailing list