[oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI
Ben Nemec
openstack at nemebean.com
Tue May 7 21:45:38 UTC 2019
On 5/4/19 4:14 PM, Damien Ciabrini wrote:
>
>
> On Fri, May 3, 2019 at 7:59 PM Michele Baldessari <michele at acksyn.org
> <mailto:michele at acksyn.org>> wrote:
>
> On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote:
> >
> >
> > On 4/22/19 12:53 PM, Alex Schultz wrote:
> > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec
> <openstack at nemebean.com <mailto:openstack at nemebean.com>> wrote:
> > > >
> > > >
> > > >
> > > > On 4/20/19 1:38 AM, Michele Baldessari wrote:
> > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700,
> iain.macdonnell at oracle.com <mailto:iain.macdonnell at oracle.com> wrote:
> > > > > >
> > > > > > Today I discovered that this problem appears to be caused
> by eventlet
> > > > > > monkey-patching. I've created a bug for it:
> > > > > >
> > > > > > https://bugs.launchpad.net/nova/+bug/1825584
> > > > >
> > > > > Hi,
> > > > >
> > > > > just for completeness we see this very same issue also with
> > > > > mistral (actually it was the first service where we noticed
> the missed
> > > > > heartbeats). iirc Alex Schultz mentioned seeing it in
> ironic as well,
> > > > > although I have not personally observed it there yet.
> > > >
> > > > Is Mistral also mixing eventlet monkeypatching and WSGI?
> > > >
> > >
> > > Looks like there is monkey patching, however we noticed it with the
> > > engine/executor. So it's likely not just wsgi. I think I also
> saw it
> > > in the ironic-conductor, though I'd have to try it out again. I'll
> > > spin up an undercloud today and see if I can get a more
> complete list
> > > of affected services. It was pretty easy to reproduce.
> >
> > Okay, I asked because if there's no WSGI/Eventlet combination
> then this may
> > be different from the Nova issue that prompted this thread. It
> sounds like
> > that was being caused by a bad interaction between WSGI and some
> Eventlet
> > timers. If there's no WSGI involved then I wouldn't expect that
> to happen.
> >
> > I guess we'll see what further investigation turns up, but based
> on the
> > preliminary information there may be two bugs here.
>
> So just to get some closure on this error that we have seen around
> mistral executor and tripleo with python3: this was due to the ansible
> action that called subprocess which has a different implementation in
> python3 and so the monkeypatching needs to be adapted.
>
> Review which fixes it for us is here:
> https://review.opendev.org/#/c/656901/
>
> Damien and I think the nova_api/eventlet/mod_wsgi has a separate
> root-cause
> (although we have not spent all too much time on that one yet)
>
>
> Right, after further investigation, it appears that the problem we saw
> under mod_wsgi was due to monkey patching, as Iain originally
> reported. It has nothing to do with our work on healthchecks.
>
> It turns out that running the AMQP heartbeat thread under mod_wsgi
> doesn't work when the threading library is monkey_patched, because the
> thread waits on a data structure [1] that has been monkey patched [2],
> which makes it yield its execution instead of sleeping for 15s.
>
> Because mod_wsgi stops the execution of its embedded interpreter, the
> AMQP heartbeat thread can't be resumed until there's a message to be
> processed in the mod_wsgi queue, which would resume the python
> interpreter and make eventlet resume the thread.
>
> Disabling monkey-patching in nova_api makes the scheduling issue go
> away.
This sounds like the right long-term solution, but it seems unlikely to
be backportable to the existing releases. As I understand it some
nova-api functionality has an actual dependency on monkey-patching. Is
there a workaround? Maybe periodically poking the API to wake up the
wsgi interpreter?
>
> Note: other services like heat-api do not use monkey patching and
> aren't affected, so this seem to confirm that monkey-patching
> shouldn't happen in nova_api running under mod_wsgi in the first
> place.
>
> [1]
> https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L904
> [2]
> https://github.com/openstack/oslo.utils/blob/master/oslo_utils/eventletutils.py#L182
More information about the openstack-discuss
mailing list