We're hitting an issue with eventlet monkey patching in OSP15, which uses python 3.6. The issue manifests as infinite recursion when a nova service connects to another openstack service over SSL. The immediate cause is that we import urllib3 before calling eventlet.monkey_patch(). I can't find the precise details now, but the reason for this relates to initialisation done by either the ssl or urllib3 libraries at import time which is therefore not patched by eventlet. The detail isn't that important, in any case. The issue may just be specific to a particular combination of library versions we're using in OSP, or possibly that our default configuration exercises something we're not doing in the gate. Regardless, unless we change where we monkey patch we're going to hit this again in the future with other library quirks. Our own eventlet best practises say: "When using eventlet.monkey_patch, do it first or not at all."[1] We're not doing that for both types of service: * non-wsgi For non-wsgi services we monkey patch in nova/cmd/__init__.py with the call utils.monkey_patch(). At first glance that appears to be very early, but note that this required importing nova.utils first, which in turn imports the world, and all before monkey patching. I believe this was regressed in Stein by change Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56[2], which was attempting to fix the wsgi case. * wsgi For wsgi services our entry point is nova.api.openstack.wsgi_app, which calls utils.monkey_patch(). This suffers the same problem as the non-wsgi case, but additionally suffers from having loaded nova/api/openstack/__init__.py first, which also imports the world. Incidentally, as noted in change Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56 we *do* currently require eventlet in at least the nova api service, even when running as a wsgi service. This has not regressed the wsgi case, though, as that would already have been broken due to the libraries imported by nova/api/openstack/__init__.py. Lee Yarwood originally posted (and I took over from him) https://review.openstack.org/#/c/626952/ with the intention of fixing the wsgi case. We semi-abandoned it because, while the fix seemed to make sense, it didn't fix the wsgi case when tested. I now realise that this is due to the libraries loaded by nova/api/openstack/__init__.py. I have resubmitted this patch as it fixes the non-wsgi case, and also updated the commit message to be accurate according to my current understanding. However, fixing the non-wsgi case is more complicated. We can't do the monkey patching in nova.api.openstack.wsgi_app because nova/api/openstack/__init__.py imports the world. I believe we have 3 options: 1. Don't import the world in nova/api/openstack/__init__.py 2. Change the wsgi entry point to, for eg, nova.wsgi_app 3. Do monkey patching in nova/__init__.py Of the above, I haven't investigated either 1 or 2 in any great detail. I suspect (1) would be quite invasive, and (2) would presumably have a deployment impact. I believe that the reason we haven't previously done (3) has been the desire not to monkey patch in certain circumstances. However, as it stands today we always need eventlet, so (3) seems like an expedient route. The code change required is very small. In testing the above I threw up 2 follow on patches: * https://review.openstack.org/643579 This makes a couple of cleanups, but significantly adds an assertion that problem libraries haven't been imported before monkey patching. As expected, this kills nova-api (wsgi): http://logs.openstack.org/79/643579/1/check/tempest-full-py3/94b0156/control... but not nova-compute (non-wsgi): http://logs.openstack.org/79/643579/1/check/tempest-full-py3/94b0156/control... Interestingly, I also bumped the minimum eventlet version in that patch to remove some no-longer-required cruft from oslo_service. If I'm reading this failure correctly, we're pinning eventlet to version 0.18.2 in the gate. Why such an old eventlet? I can't see where this pin is coming from, but my (unsubstantiated) guess is that this is the principal difference from OSP, where we're currently installing eventlet 0.24.1: http://logs.openstack.org/79/643579/1/check/requirements-check/fc9f093/job-o... * https://review.openstack.org/#/c/643581/ Here I just switched to top-level monkey patching. This is still running, but at first glance the tempest runs don't seem to have failed early, which they would have given that the patch is stacked on top of the the assertions from the prior patch. TL;DR I want to move monkey patching to nova/__init__.py because, for now at least, we always require it anyway. When we eventually remove the requirement for wsgi services to monkey patch we can move monkey patching back to nova/cmd/__init__.py. Matt [1] https://specs.openstack.org/openstack/openstack-specs/specs/eventlet-best-pr... [2] https://review.openstack.org/#/c/592285/ -- Matthew Booth Red Hat OpenStack Engineer, Compute DFG Phone: +442070094448 (UK)