[openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

Ray Pekowski pekowski at gmail.com
Thu Jun 6 17:03:18 UTC 2013


I was thinking about the use of RPC consume_in_thread() in the single reply
queue changes I made and how the crashing of the created thread would
result in RPC call timeout failures in all future attempts to issue RPC
calls (not casts).  Probably, the worst part is that the RPC calls would be
sent and perhaps successfully executed, but the responses would not be
received.  The RPC would timeout.

There is a similar issue with the consume_in_thread() being done in the RPC
dispatcher that receives RPC requests, but in this case the RPCs are not
executed.

I looked into consume_in_thread() and realize that the base thread it
creates does not in fact do the equivalent of a catch all (except
Exception).  This means some unexpected exception could result in the death
of the consumer thread.  I was wondering if we should add an "except
Exception" in the consumer infinite loop that perhaps sleeps for a time and
goes on.  Of course, we would have to make sure in the same loop an
exception for a valid kill of the thread was caught.  Currently that is
"except greenlet.GreenletExit".  On the other hand, if the thread died for
some unexpected reason, it might just die again for the same reason, but
that is why I suggest adding a sleep to the loop.  To prevent a 100% CPU
situation.

At minimum, some check for consumer thread still running should be made
before sending an RPC call.

Maybe I'm reading the code wrong.  If not, this affects all RPC
implementations: kombu, qpid and zmq.  For zmq, I think the problem only
exists on the server side.

Any thoughts?

Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130606/a5ee4e7c/attachment.html>


More information about the OpenStack-dev mailing list