[openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

Ray Pekowski pekowski at gmail.com
Thu Jun 6 20:46:14 UTC 2013


On Thu, Jun 6, 2013 at 3:22 PM, Chris Behrens <cbehrens at codestud.com> wrote:

>
> ?  There's a try/except in reconnect() in impl_kombu around the piece that
> can raise… unless we want to think something like LOG.* calls could fail.
>

Sure there is a try/except in reconnect(), but the catch all re-raises the
exception unless it is a timeout.  Here is the code:

            except Exception as e:
                # NOTE(comstud): Unfortunately it's possible for amqplib
                # to return an error not covered by its transport
                # connection_errors in the case of a timeout waiting for
                # a protocol response.  (See paste link in LP888621)
                # So, we check all exceptions for 'timeout' in them
                # and try to reconnect in this case.
                if 'timeout' not in str(e):
                    raise

Preceding the above is a catch for a few other exception types:

            except (IOError, self.connection_errors) as e:
                pass

but anything else will simply raise an exception in an unprotected piece of
code, since ensure() doesn't include the reconnect itself in a try block:

    def ensure(self, error_callback, method, *args, **kwargs):
        while True:
            try:
                return method(*args, **kwargs)
            except (self.connection_errors, socket.timeout, IOError) as e:
                if error_callback:
                    error_callback(e)
            except Exception as e:
                # NOTE(comstud): Unfortunately it's possible for amqplib
                # to return an error not covered by its transport
                # connection_errors in the case of a timeout waiting for
                # a protocol response.  (See paste link in LP888621)
                # So, we check all exceptions for 'timeout' in them
                # and try to reconnect in this case.
                if 'timeout' not in str(e):
                    raise
                if error_callback:
                    error_callback(e)
            self.reconnect()

See the last line above.

One unknown is whether there are any other exceptions coming from the
reconnect() code other than IOError, self.connection_errors and timeout.
Maybe, maybe not.  That's why I do give some credit to this code, but it
wouldn't hurt to plug this hole in a way that would prevent future bugs.
In other words, as close to the base of the thread as possible.

Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130606/10e4c0f3/attachment.html>


More information about the OpenStack-dev mailing list