Open Stack

Thu Jun 6 17:56:00 UTC 2013

Hey Ray,

I can confirm that this problem at least affects the qpid implementation. I
think catching all exceptions and retrying in a backing off manner is
great. I have noticed that it seems like we haven't been doing a blanket
catch-all for both sql and amqp connections. I'm assuming that was a
decision not a mistake, so I'm interested to hear feedback on this also.

Anyway, +1 from me on doing what you are suggesting.

-Mike Wilson

On Thu, Jun 6, 2013 at 11:03 AM, Ray Pekowski <pekowski at gmail.com> wrote:

> I was thinking about the use of RPC consume_in_thread() in the single
> reply queue changes I made and how the crashing of the created thread would
> result in RPC call timeout failures in all future attempts to issue RPC
> calls (not casts).  Probably, the worst part is that the RPC calls would be
> sent and perhaps successfully executed, but the responses would not be
> received.  The RPC would timeout.
>
> There is a similar issue with the consume_in_thread() being done in the
> RPC dispatcher that receives RPC requests, but in this case the RPCs are
> not executed.
>
> I looked into consume_in_thread() and realize that the base thread it
> creates does not in fact do the equivalent of a catch all (except
> Exception).  This means some unexpected exception could result in the death
> of the consumer thread.  I was wondering if we should add an "except
> Exception" in the consumer infinite loop that perhaps sleeps for a time and
> goes on.  Of course, we would have to make sure in the same loop an
> exception for a valid kill of the thread was caught.  Currently that is
> "except greenlet.GreenletExit".  On the other hand, if the thread died for
> some unexpected reason, it might just die again for the same reason, but
> that is why I suggest adding a sleep to the loop.  To prevent a 100% CPU
> situation.
>
> At minimum, some check for consumer thread still running should be made
> before sending an RPC call.
>
> Maybe I'm reading the code wrong.  If not, this affects all RPC
> implementations: kombu, qpid and zmq.  For zmq, I think the problem only
> exists on the server side.
>
> Any thoughts?
>
> Ray
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130606/d79d4aea/attachment.html>

Open Stack

[openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

OpenStack

Community

Documentation

Branding & Legal