<div dir="ltr">Oop - didn't reply all <br><div><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Ken Giusti</b> <span dir="ltr"><<a href="mailto:kgiusti@gmail.com">kgiusti@gmail.com</a>></span><br>Date: Tue, Aug 1, 2017 at 12:51 PM<br>Subject: Re: [openstack-dev] [oslo][barbican][sahara] start RPC service before launcher wait?<br>To: Adam Spiers <<a href="mailto:aspiers@suse.com">aspiers@suse.com</a>><br><br><br><div dir="ltr"><div><div><div><div><div>Hi Adam,<br><br></div>I think there's a couple of problems here.<br><br></div>Regardless of worker count, the service.wait() is called before service.start().  And from looking at the oslo.service code, the 'wait()' method is call after start(), then again after stop().  This doesn't match up with the intended use of oslo.messaging.server.wait(), which should only be called after .stop().<br><br></div>Perhaps a bigger issue is that in the multi threaded case all threads appear to be calling start, wait, and stop on the same instance of the service (oslo.messaging rpc server).  At least that's what I'm seeing in my muchly reduced test code:<br><br><a href="https://paste.fedoraproject.org/paste/-73zskccaQvpSVwRJD11cA" target="_blank">https://paste.fedoraproject.<wbr>org/paste/-<wbr>73zskccaQvpSVwRJD11cA</a><br><br></div>The log trace shows multiple calls to start, wait, stop via different threads to the same TaskServer instance:<br><br><a href="https://paste.fedoraproject.org/paste/dyPq~lr26sQZtMzHn5w~Vg" target="_blank">https://paste.fedoraproject.<wbr>org/paste/dyPq~lr26sQZtMzHn5w~<wbr>Vg</a><br><br></div>Is that expected?<br></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Mon, Jul 31, 2017 at 9:32 PM, Adam Spiers <span dir="ltr"><<a href="mailto:aspiers@suse.com" target="_blank">aspiers@suse.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ken Giusti <<a href="mailto:kgiusti@gmail.com" target="_blank">kgiusti@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>

On Mon, Jul 31, 2017 at 10:01 AM, Adam Spiers <<a href="mailto:aspiers@suse.com" target="_blank">aspiers@suse.com</a>> wrote:<br>

</span><div><div class="m_3558050355365189517h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I recently discovered a bug where barbican-worker would hang on<br>

shutdown if queue.asynchronous_workers was changed from 1 to 2:<br>

<br>

   <a href="https://bugs.launchpad.net/barbican/+bug/1705543" rel="noreferrer" target="_blank">https://bugs.launchpad.net/ba<wbr>rbican/+bug/1705543</a><br>

<br>

resulting in a warning like this:<br>

<br>

   WARNING oslo_messaging.server [-] Possible hang: stop is waiting for<br>

start to complete<br>

<br>

I found a similar bug in Sahara:<br>

<br>

   <a href="https://bugs.launchpad.net/sahara/+bug/1546119" rel="noreferrer" target="_blank">https://bugs.launchpad.net/sa<wbr>hara/+bug/1546119</a><br>

<br>

where the fix was to call start() on the RPC service before making the<br>

launcher wait() on it, so I ported the fix to Barbican, and it seems<br>

to work fine:<br>

<br>

   <a href="https://review.openstack.org/#/c/485755" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>#/c/485755</a><br>

<br>

I noticed that both projects use ProcessLauncher; barbican uses<br>

oslo_service.service.launch() which has:<br>

<br>

   if workers is None or workers == 1:<br>

       launcher = ServiceLauncher(conf, restart_method=restart_method)<br>

   else:<br>

       launcher = ProcessLauncher(conf, restart_method=restart_method)<br>

<br>

However, I'm not an expert in oslo.service or oslo.messaging, and one<br>

of Barbican's core reviewers (thanks Kaitlin!) noted that not many<br>

other projects start the task before calling wait() on the launcher,<br>

so I thought I'd check here whether that is the correct fix, or<br>

whether there's something else odd going on.<br>

<br>

Any oslo gurus able to shed light on this?<br>

</blockquote>

<br></div></div><span>

As far as an oslo.messaging server is concerned, the order of operations is:<br>

<br>

server.start()<br>

# do stuff until ready to stop the server...<br>

server.stop()<br>

server.wait()<br>

<br>

The final wait blocks until all requests that are in progress when stop()<br>

is called finish and cleanup.<br>

</span></blockquote>

<br>

Thanks - that makes sense.  So the question is, why would<br>

barbican-worker only hang on shutdown when there are multiple workers?<br>

Maybe the real bug is somewhere in oslo_service.service.ProcessLa<wbr>uncher<br>

and it's not calling start() correctly?<br>

</blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br><div class="m_3558050355365189517gmail_signature" data-smartmail="gmail_signature">Ken Giusti  (<a href="mailto:kgiusti@gmail.com" target="_blank">kgiusti@gmail.com</a>)</div>

</font></span></div>

</div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Ken Giusti  (<a href="mailto:kgiusti@gmail.com" target="_blank">kgiusti@gmail.com</a>)</div>

</div></div>