[openstack-dev] [Oslo][Neutron] Fork() safety and oslo.messaging

Ken Giusti kgiusti at gmail.com
Tue Nov 25 15:47:23 UTC 2014


Hi Mehdi

On Tue, Nov 25, 2014 at 5:38 AM, Mehdi Abaakouk <sileht at sileht.net> wrote:
>
> Hi,
>
> I think the main issue is the behavior of the API
> of oslo-incubator/openstack/common/service.py, specially:
>
>  * ProcessLauncher.launch_service(MyService())
>
> And then the MyService have this behavior:
>
> class MyService:
>    def __init__(self):
>        # CODE DONE BEFORE os.fork()
>
>    def start(self):
>        # CODE DONE AFTER os.fork()
>
> So if an application created a FD inside MyService.__init__ or before ProcessLauncher.launch_service, it will be shared between
> processes and we got this kind of issues...
>
> For the rabbitmq/qpid driver, the first connection is created when the rpc server is started or when the first rpc call/cast/... is done.
>
> So if the application doesn't do that inside MyService.__init__ or before ProcessLauncher.launch_service everything works as expected.
>
> But if the issue is raised I think this is an application issue (rpc stuff done before the os.fork())
>

Mmmm... I don't think it's that clear (re: an application issue).  I
mean, yes - the application is doing the os.fork() at the 'wrong'
time, but where is this made clear in the oslo.messaging API
documentation?

I think this is the real issue here:  what is the "official" guidance
for using os.fork() and its interaction with oslo libraries?

In the case of oslo.messaging, I can't find any mention of os.fork()
in the API docs (I may have missed it - please correct me if so).
That would imply - at least to me - that there is _no_ restrictions of
using os.fork() together with oslo.messaging.

But in the case of qpid, that is definitely _not_ the case.

The legacy qpid driver - impl_qpid - imports a 3rd party library, the
qpid.messaging API.   This library uses threading.Thread internally,
we (consumers of this library) have no control over how that thread is
managed.  So for impl_qpid, os.fork()'ing after the driver is loaded
can't be guaranteed to work.   In fact, I'd say os.fork() and
impl_qpid will not work - full stop.

> For the amqp1 driver case, I think this is the same things, it seems to do lazy creation of the connection too.
>

We have more flexibility here, since the driver directly controls when
the thread is spawned.  But the very fact that the thread is used
places a restriction on how oslo.messaging and os.fork() can be used
together, which isn't made clear in the documentation for the library.

I'm not familiar with the rabbit driver - I've seen some patch for
heatbeating in rabbit introduce threading, so there may also be an
implication there as well.


> I will take a look to the neutron code, if I found a rpc usage
> before the os.fork().
>

I've done some tracing of neutron-server's behavior in this case - you
may want to take a look at

 https://bugs.launchpad.net/neutron/+bug/1330199/comments/8

>
> Personally, I don't like this API, because the behavior difference between
> '__init__' and 'start' is too implicit.
>

That's true, but I'd say that the problem of implicitness re:
os.fork() needs to be clarified at the library level as well.

thanks,

-K

> Cheers,
>
> ---
> Mehdi Abaakouk
> mail: sileht at sileht.net
> irc: sileht
>
>
> Le 2014-11-24 20:27, Ken Giusti a écrit :
>
>> Hi all,
>>
>> As far as oslo.messaging is concerned, should it be possible for the
>> main application to safely os.fork() when there is already an active
>> connection to a messaging broker?
>>
>> I ask because I'm hitting what appears to be fork-related issues with
>> the new AMQP 1.0 driver.  I think the same problems have been seen
>> with the older impl_qpid driver as well [0]
>>
>> Both drivers utilize a background threading.Thread that handles all
>> async socket I/O and protocol timers.
>>
>> In the particular case I'm trying to debug, rpc_workers is set to 4 in
>> neutron.conf.  As far as I can tell, this causes neutron.service to
>> os.fork() four workers, but does so after it has created a listener
>> (and therefore a connection to the broker).
>>
>> This results in multiple processes all select()'ing the same set of
>> networks sockets, and stuff breaks :(
>>
>> Even without the background process, wouldn't this use still result in
>> sockets being shared across the parent/child processes?   Seems
>> dangerous.
>>
>> Thoughts?
>>
>> [0] https://bugs.launchpad.net/oslo.messaging/+bug/1330199




-- 
Ken Giusti  (kgiusti at gmail.com)



More information about the OpenStack-dev mailing list