[openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
Mike Wilson
geekinutah at gmail.com
Thu Jun 6 17:56:00 UTC 2013
Hey Ray,
I can confirm that this problem at least affects the qpid implementation. I
think catching all exceptions and retrying in a backing off manner is
great. I have noticed that it seems like we haven't been doing a blanket
catch-all for both sql and amqp connections. I'm assuming that was a
decision not a mistake, so I'm interested to hear feedback on this also.
Anyway, +1 from me on doing what you are suggesting.
-Mike Wilson
On Thu, Jun 6, 2013 at 11:03 AM, Ray Pekowski <pekowski at gmail.com> wrote:
> I was thinking about the use of RPC consume_in_thread() in the single
> reply queue changes I made and how the crashing of the created thread would
> result in RPC call timeout failures in all future attempts to issue RPC
> calls (not casts). Probably, the worst part is that the RPC calls would be
> sent and perhaps successfully executed, but the responses would not be
> received. The RPC would timeout.
>
> There is a similar issue with the consume_in_thread() being done in the
> RPC dispatcher that receives RPC requests, but in this case the RPCs are
> not executed.
>
> I looked into consume_in_thread() and realize that the base thread it
> creates does not in fact do the equivalent of a catch all (except
> Exception). This means some unexpected exception could result in the death
> of the consumer thread. I was wondering if we should add an "except
> Exception" in the consumer infinite loop that perhaps sleeps for a time and
> goes on. Of course, we would have to make sure in the same loop an
> exception for a valid kill of the thread was caught. Currently that is
> "except greenlet.GreenletExit". On the other hand, if the thread died for
> some unexpected reason, it might just die again for the same reason, but
> that is why I suggest adding a sleep to the loop. To prevent a 100% CPU
> situation.
>
> At minimum, some check for consumer thread still running should be made
> before sending an RPC call.
>
> Maybe I'm reading the code wrong. If not, this affects all RPC
> implementations: kombu, qpid and zmq. For zmq, I think the problem only
> exists on the server side.
>
> Any thoughts?
>
> Ray
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130606/d79d4aea/attachment.html>
More information about the OpenStack-dev
mailing list