[nova][dev] Fixing eventlet monkey patching in Nova

Monty Taylor mordred at inaugust.com
Fri Mar 15 21:41:20 UTC 2019



On 3/15/19 8:03 PM, Sean Mooney wrote:
> On Fri, 2019-03-15 at 12:15 -0700, melanie witt wrote:
>> On Fri, 15 Mar 2019 18:53:35 +0000 (GMT), Chris Dent
>> <cdent+os at anticdent.org> wrote:
>>> The mod_wsgi/uwsgi side of things strived to be eventlet free as it
>>> makes for weirdness, and at some point I did some work to be sure it
>>> never showed up in placement [1] while it was still in nova. A side
>>> effect of that work was that it also didn't need to show up in the
>>> nova-api unless you used the cmd line script version of it. At the
>>> same time I also explored not importing the world, and was able to
>>> get some improvements (mostly by moving things out of __init__.py in
>>> packages that had deeply nested members) but not as much as would
>>> have liked.
>>>
>>> However, we later (as mentioned elsewhere in the thread) made
>>> getting cell mappings parallelize, bring back the need for eventlet,
>>> it seems.
>>
>> Is there an alternative we could use for threading in the API that is
>> compatible with python 2.7? I'd be happy to convert the cells
>> scatter-gather code to use it, if so.
> its a little heavy weight but python multi process or explicit threads would work.
> 
> taking the multi processing example
> https://docs.python.org/2/library/multiprocessing.html
> 
> from multiprocessing import Pool
> 
> def f(x):
>      return x*x
> 
> if __name__ == '__main__':
>      p = Pool(5)
>      print(p.map(f, [1, 2, 3]))
> 
> you bassically could create a pool X process and map the function acroos the pool.
> X could either be a fixed parallelisum factor or the total numa of cells.
> 
> if f is a function that takes the cell and lists the instances then
> p.map(f,["cell1","cell2",...]) returns the set of results from each of the concurrent executions.
> but it also blocks until they are completed.
> 
> eventlets gives you concurency which means we interleave the requests but only one request is executing at
> anyone time. using mulitprocess will give real parallelism but no concurance as we will block until all the
> parralel request are completed.
> 
> you can kind of get the best of both world by submitting the request asycronously
> as is showen in the later example
> https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
> 
> # launching multiple evaluations asynchronously *may* use more processes
>      multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)]
>      print [res.get(timeout=1) for res in multiple_results]
> 
> this allows you to submit multiple request to the pool in parallel
> then retrive them with a timeout after all results are submitted allowing use
> limit the time we wait if there is a slow or down cell.

FWIW, we use explicit threads in Zuul and Nodepool and have been very 
happy with them- since they're explicit and don't require weird 
monkeypatching.

In sdk, there are a few things that need to be done in the background 
(when uploading a very large swift object and we're going to split it in 
to chunks and upload concurrently) In that case we used 
concurrent.futures.ThreadPoolExecutor which creates a pool of threads 
and allows you to submit jobs into it ... very simiar to the 
using-pool-of-workers example above.

   executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
   job_future = executor.submit(some_callable, some_arg, other=arg)
   for completed in concurrent.futures.as_completed([job_future]):
     result = completed.result()

https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/object_store/v1/_proxy.py#L510-L526

https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/cloud/openstackcloud.py#L7332-L7366


> 
> wsgi also provides an additional layer of parallelism as each instance of the api should be severing only one request
> and that parallelism is managed by uwsgi or apache if using mod_wsgi.
> 
> im not sure if multiprocess is warented in this case but if we did use it we should proably create the pool once
> and reuse it rather then inline in the scater gather function.
> 
> anyway that would how i would personally aproch this if it was identified as a perfromace issue with
> a config argument to control the number of processes in the pool but it would definitely be a train thing
> as its a non tivial change in how this currently works.
> 
> 
>>
>> -melanie
>>
> 
> 
> 



More information about the openstack-discuss mailing list