[openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
Vishvananda Ishaya
vishvananda at gmail.com
Thu Jun 6 18:12:45 UTC 2013
On Jun 6, 2013, at 10:03 AM, Ray Pekowski <pekowski at gmail.com> wrote:
> I was thinking about the use of RPC consume_in_thread() in the single reply queue changes I made and how the crashing of the created thread would result in RPC call timeout failures in all future attempts to issue RPC calls (not casts). Probably, the worst part is that the RPC calls would be sent and perhaps successfully executed, but the responses would not be received. The RPC would timeout.
>
> There is a similar issue with the consume_in_thread() being done in the RPC dispatcher that receives RPC requests, but in this case the RPCs are not executed.
>
> I looked into consume_in_thread() and realize that the base thread it creates does not in fact do the equivalent of a catch all (except Exception). This means some unexpected exception could result in the death of the consumer thread. I was wondering if we should add an "except Exception" in the consumer infinite loop that perhaps sleeps for a time and goes on. Of course, we would have to make sure in the same loop an exception for a valid kill of the thread was caught. Currently that is "except greenlet.GreenletExit". On the other hand, if the thread died for some unexpected reason, it might just die again for the same reason, but that is why I suggest adding a sleep to the loop. To prevent a 100% CPU situation.
>
> At minimum, some check for consumer thread still running should be made before sending an RPC call.
>
> Maybe I'm reading the code wrong. If not, this affects all RPC implementations: kombu, qpid and zmq. For zmq, I think the problem only exists on the server side.
>
> Any thoughts?
+1 consumer greenthreads have died in the past and while i haven't seen this issue recently, it is a particularly nasty one to debug because often the exception doesn't show up in the logs.
Vish
More information about the OpenStack-dev
mailing list