[oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI
melanie witt
melwittt at gmail.com
Wed May 8 08:15:34 UTC 2019
On Tue, 7 May 2019 15:22:36 -0700, Iain Macdonnell
<iain.macdonnell at oracle.com> wrote:
>
>
> On 5/7/19 2:45 PM, Ben Nemec wrote:
>>
>>
>> On 5/4/19 4:14 PM, Damien Ciabrini wrote:
>>>
>>>
>>> On Fri, May 3, 2019 at 7:59 PM Michele Baldessari <michele at acksyn.org
>>> <mailto:michele at acksyn.org>> wrote:
>>>
>>> On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote:
>>> >
>>> >
>>> > On 4/22/19 12:53 PM, Alex Schultz wrote:
>>> > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec
>>> <openstack at nemebean.com <mailto:openstack at nemebean.com>> wrote:
>>> > > >
>>> > > >
>>> > > >
>>> > > > On 4/20/19 1:38 AM, Michele Baldessari wrote:
>>> > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700,
>>> iain.macdonnell at oracle.com <mailto:iain.macdonnell at oracle.com> wrote:
>>> > > > > >
>>> > > > > > Today I discovered that this problem appears to be caused
>>> by eventlet
>>> > > > > > monkey-patching. I've created a bug for it:
>>> > > > > >
>>> > > > > >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_nova_-2Bbug_1825584&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=zgCsi2WthDNaeptBSW02iplSjxg9P_zrnfocp8P06oA&e=
>>>
>>> > > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > just for completeness we see this very same issue also with
>>> > > > > mistral (actually it was the first service where we noticed
>>> the missed
>>> > > > > heartbeats). iirc Alex Schultz mentioned seeing it in
>>> ironic as well,
>>> > > > > although I have not personally observed it there yet.
>>> > > >
>>> > > > Is Mistral also mixing eventlet monkeypatching and WSGI?
>>> > > >
>>> > >
>>> > > Looks like there is monkey patching, however we noticed it
>>> with the
>>> > > engine/executor. So it's likely not just wsgi. I think I also
>>> saw it
>>> > > in the ironic-conductor, though I'd have to try it out
>>> again. I'll
>>> > > spin up an undercloud today and see if I can get a more
>>> complete list
>>> > > of affected services. It was pretty easy to reproduce.
>>> >
>>> > Okay, I asked because if there's no WSGI/Eventlet combination
>>> then this may
>>> > be different from the Nova issue that prompted this thread. It
>>> sounds like
>>> > that was being caused by a bad interaction between WSGI and some
>>> Eventlet
>>> > timers. If there's no WSGI involved then I wouldn't expect that
>>> to happen.
>>> >
>>> > I guess we'll see what further investigation turns up, but based
>>> on the
>>> > preliminary information there may be two bugs here.
>>>
>>> So just to get some closure on this error that we have seen around
>>> mistral executor and tripleo with python3: this was due to the
>>> ansible
>>> action that called subprocess which has a different implementation in
>>> python3 and so the monkeypatching needs to be adapted.
>>>
>>> Review which fixes it for us is here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_656901_&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=1o81kC60gB8_5zIgi8WugZaOma_3m7grG4RQ-aVsbSE&e=
>>>
>>>
>>> Damien and I think the nova_api/eventlet/mod_wsgi has a separate
>>> root-cause
>>> (although we have not spent all too much time on that one yet)
>>>
>>>
>>> Right, after further investigation, it appears that the problem we saw
>>> under mod_wsgi was due to monkey patching, as Iain originally
>>> reported. It has nothing to do with our work on healthchecks.
>>>
>>> It turns out that running the AMQP heartbeat thread under mod_wsgi
>>> doesn't work when the threading library is monkey_patched, because the
>>> thread waits on a data structure [1] that has been monkey patched [2],
>>> which makes it yield its execution instead of sleeping for 15s.
>>>
>>> Because mod_wsgi stops the execution of its embedded interpreter, the
>>> AMQP heartbeat thread can't be resumed until there's a message to be
>>> processed in the mod_wsgi queue, which would resume the python
>>> interpreter and make eventlet resume the thread.
>>>
>>> Disabling monkey-patching in nova_api makes the scheduling issue go
>>> away.
>>
>> This sounds like the right long-term solution, but it seems unlikely to
>> be backportable to the existing releases. As I understand it some
>> nova-api functionality has an actual dependency on monkey-patching. Is
>> there a workaround? Maybe periodically poking the API to wake up the
>> wsgi interpreter?
>
> I've been pondering things like that ... but if I have multiple WSGI
> processes, can I be sure that an API-poke will hit the one(s) that need it?
>
> This is a road-block for me upgrading to Stein. I really don't want to
> have to go back to running nova-api standalone, but that's increasingly
> looking like the only "safe" option :/
FWIW, I have a patch series that aims to re-eliminate the eventlet
dependency in nova-api:
https://review.opendev.org/657750 (top patch)
if you might be able to give it a try. If it helps, then maybe we could
backport to Stein if folks are in support.
-melanie
>
>
>>> Note: other services like heat-api do not use monkey patching and
>>> aren't affected, so this seem to confirm that monkey-patching
>>> shouldn't happen in nova_api running under mod_wsgi in the first
>>> place.
>>>
>>> [1]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.messaging_blob_master_oslo-5Fmessaging_-5Fdrivers_impl-5Frabbit.py-23L904&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=O5nQh1r8Zmded00yYMXrfxL44xcd9KqFK-VOa0cg6gs&e=
>>>
>>> [2]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.utils_blob_master_oslo-5Futils_eventletutils.py-23L182&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=QRkXCiqv6zcnO2b2p8Uv6cgRuu1R414B9SvILuugN6w&e=
>>>
>>
>
More information about the openstack-discuss
mailing list