[nova][dev] Fixing eventlet monkey patching in Nova

Ben Nemec openstack at nemebean.com
Fri Mar 15 18:41:55 UTC 2019

*shakes fist at eventlet*

*writes more useful things inline*

On 3/15/19 12:47 PM, Matthew Booth wrote:
> We're hitting an issue with eventlet monkey patching in OSP15, which
> uses python 3.6. The issue manifests as infinite recursion when a nova
> service connects to another openstack service over SSL. The immediate
> cause is that we import urllib3 before calling
> eventlet.monkey_patch(). I can't find the precise details now, but the
> reason for this relates to initialisation done by either the ssl or
> urllib3 libraries at import time which is therefore not patched by
> eventlet.
> The detail isn't that important, in any case. The issue may just be
> specific to a particular combination of library versions we're using
> in OSP, or possibly that our default configuration exercises something
> we're not doing in the gate. Regardless, unless we change where we
> monkey patch we're going to hit this again in the future with other
> library quirks. Our own eventlet best practises say: "When using
> eventlet.monkey_patch, do it first or not at all."[1] We're not doing
> that for both types of service:
> * non-wsgi
> For non-wsgi services we monkey patch in nova/cmd/__init__.py with the
> call utils.monkey_patch(). At first glance that appears to be very
> early, but note that this required importing nova.utils first, which
> in turn imports the world, and all before monkey patching.
> I believe this was regressed in Stein by change
> Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56[2], which was attempting to
> fix the wsgi case.
> * wsgi
> For wsgi services our entry point is nova.api.openstack.wsgi_app,
> which calls utils.monkey_patch(). This suffers the same problem as the
> non-wsgi case, but additionally suffers from having loaded
> nova/api/openstack/__init__.py first, which also imports the world.
> Incidentally, as noted in change
> Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56 we *do* currently require
> eventlet in at least the nova api service, even when running as a wsgi
> service. This has not regressed the wsgi case, though, as that would
> already have been broken due to the libraries imported by
> nova/api/openstack/__init__.py.
> Lee Yarwood originally posted (and I took over from him)
> https://review.openstack.org/#/c/626952/ with the intention of fixing
> the wsgi case. We semi-abandoned it because, while the fix seemed to
> make sense, it didn't fix the wsgi case when tested. I now realise
> that this is due to the libraries loaded by
> nova/api/openstack/__init__.py. I have resubmitted this patch as it
> fixes the non-wsgi case, and also updated the commit message to be
> accurate according to my current understanding.
> However, fixing the non-wsgi case is more complicated. We can't do the
> monkey patching in nova.api.openstack.wsgi_app because
> nova/api/openstack/__init__.py imports the world.
> I believe we have 3 options:
> 1. Don't import the world in nova/api/openstack/__init__.py
> 2. Change the wsgi entry point to, for eg, nova.wsgi_app
> 3. Do monkey patching in nova/__init__.py
> Of the above, I haven't investigated either 1 or 2 in any great
> detail. I suspect (1) would be quite invasive, and (2) would
> presumably have a deployment impact. I believe that the reason we
> haven't previously done (3) has been the desire not to monkey patch in
> certain circumstances. However, as it stands today we always need
> eventlet, so (3) seems like an expedient route. The code change
> required is very small.

According to [0], the monkey-patching wasn't done in nova/__init__.py 
because it was supposed to be able to run without eventlet under WSGI. 
If that's not the case, then it's probably fine to do it there.


"Monkey patching should also be done in a way that allows services to 
run without it, such as when an API service runs under Apache. This is 
the reason for Nova not simply monkey patching in nova/__init__.py."

> In testing the above I threw up 2 follow on patches:
> * https://review.openstack.org/643579
> This makes a couple of cleanups, but significantly adds an assertion
> that problem libraries haven't been imported before monkey patching.
> As expected, this kills nova-api (wsgi):
> http://logs.openstack.org/79/643579/1/check/tempest-full-py3/94b0156/controller/logs/screen-n-api.txt.gz
> but not nova-compute (non-wsgi):
> http://logs.openstack.org/79/643579/1/check/tempest-full-py3/94b0156/controller/logs/screen-n-cpu.txt.gz
> Interestingly, I also bumped the minimum eventlet version in that
> patch to remove some no-longer-required cruft from oslo_service. If
> I'm reading this failure correctly, we're pinning eventlet to version
> 0.18.2 in the gate. Why such an old eventlet? I can't see where this
> pin is coming from, but my (unsubstantiated) guess is that this is the
> principal difference from OSP, where we're currently installing
> eventlet 0.24.1:
> http://logs.openstack.org/79/643579/1/check/requirements-check/fc9f093/job-output.txt.gz

I left a couple of comments on the review about the requirements issues.

> * https://review.openstack.org/#/c/643581/
> Here I just switched to top-level monkey patching. This is still
> running, but at first glance the tempest runs don't seem to have
> failed early, which they would have given that the patch is stacked on
> top of the the assertions from the prior patch.
> TL;DR I want to move monkey patching to nova/__init__.py because, for
> now at least, we always require it anyway. When we eventually remove
> the requirement for wsgi services to monkey patch we can move monkey
> patching back to nova/cmd/__init__.py.

This sounds fine to me, but it's been a loooong time since I looked at 
this and I'm not familiar with some of the more recent changes you 
talked about, so take my opinion with a grain of salt.

> Matt
> [1] https://specs.openstack.org/openstack/openstack-specs/specs/eventlet-best-practices.html
> [2] https://review.openstack.org/#/c/592285/

More information about the openstack-discuss mailing list