[openstack-dev] [oslo][barbican][sahara] start RPC service before launcher wait?

Ken Giusti kgiusti at gmail.com
Wed Aug 2 14:02:09 UTC 2017


Oop - didn't reply all
---------- Forwarded message ----------
From: Ken Giusti <kgiusti at gmail.com>
Date: Tue, Aug 1, 2017 at 12:51 PM
Subject: Re: [openstack-dev] [oslo][barbican][sahara] start RPC service
before launcher wait?
To: Adam Spiers <aspiers at suse.com>


Hi Adam,

I think there's a couple of problems here.

Regardless of worker count, the service.wait() is called before
service.start().  And from looking at the oslo.service code, the 'wait()'
method is call after start(), then again after stop().  This doesn't match
up with the intended use of oslo.messaging.server.wait(), which should only
be called after .stop().

Perhaps a bigger issue is that in the multi threaded case all threads
appear to be calling start, wait, and stop on the same instance of the
service (oslo.messaging rpc server).  At least that's what I'm seeing in my
muchly reduced test code:

https://paste.fedoraproject.org/paste/-73zskccaQvpSVwRJD11cA

The log trace shows multiple calls to start, wait, stop via different
threads to the same TaskServer instance:

https://paste.fedoraproject.org/paste/dyPq~lr26sQZtMzHn5w~Vg

Is that expected?

On Mon, Jul 31, 2017 at 9:32 PM, Adam Spiers <aspiers at suse.com> wrote:

> Ken Giusti <kgiusti at gmail.com> wrote:
>
>> On Mon, Jul 31, 2017 at 10:01 AM, Adam Spiers <aspiers at suse.com> wrote:
>>
>>> I recently discovered a bug where barbican-worker would hang on
>>> shutdown if queue.asynchronous_workers was changed from 1 to 2:
>>>
>>>    https://bugs.launchpad.net/barbican/+bug/1705543
>>>
>>> resulting in a warning like this:
>>>
>>>    WARNING oslo_messaging.server [-] Possible hang: stop is waiting for
>>> start to complete
>>>
>>> I found a similar bug in Sahara:
>>>
>>>    https://bugs.launchpad.net/sahara/+bug/1546119
>>>
>>> where the fix was to call start() on the RPC service before making the
>>> launcher wait() on it, so I ported the fix to Barbican, and it seems
>>> to work fine:
>>>
>>>    https://review.openstack.org/#/c/485755
>>>
>>> I noticed that both projects use ProcessLauncher; barbican uses
>>> oslo_service.service.launch() which has:
>>>
>>>    if workers is None or workers == 1:
>>>        launcher = ServiceLauncher(conf, restart_method=restart_method)
>>>    else:
>>>        launcher = ProcessLauncher(conf, restart_method=restart_method)
>>>
>>> However, I'm not an expert in oslo.service or oslo.messaging, and one
>>> of Barbican's core reviewers (thanks Kaitlin!) noted that not many
>>> other projects start the task before calling wait() on the launcher,
>>> so I thought I'd check here whether that is the correct fix, or
>>> whether there's something else odd going on.
>>>
>>> Any oslo gurus able to shed light on this?
>>>
>>
>> As far as an oslo.messaging server is concerned, the order of operations
>> is:
>>
>> server.start()
>> # do stuff until ready to stop the server...
>> server.stop()
>> server.wait()
>>
>> The final wait blocks until all requests that are in progress when stop()
>> is called finish and cleanup.
>>
>
> Thanks - that makes sense.  So the question is, why would
> barbican-worker only hang on shutdown when there are multiple workers?
> Maybe the real bug is somewhere in oslo_service.service.ProcessLauncher
> and it's not calling start() correctly?
>



-- 
Ken Giusti  (kgiusti at gmail.com)



-- 
Ken Giusti  (kgiusti at gmail.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170802/7fd66d76/attachment.html>


More information about the OpenStack-dev mailing list