[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover
Chris Friesen
chris.friesen at windriver.com
Mon Mar 24 17:40:33 UTC 2014
On 03/24/2014 11:31 AM, Chris Friesen wrote:
> It looks like we're raising
>
> RecoverableConnectionError: connection already closed
>
> down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but
> nothing handles it.
>
> It looks like the most likely place that should be handling it is
> nova.openstack.common.rpc.impl_kombu.Connection.ensure().
>
>
> In the current oslo.messaging code the ensure() routine explicitly
> handles connection errors (which RecoverableConnectionError is) and
> socket timeouts--the ensure() routine in Havana doesn't do this.
I misread the code, ensure() in Havana does in fact monitor socket
timeouts, but it doesn't handle connection errors.
It looks like support for handling connection errors was added to
oslo.messaging just recently in git commit 0400cbf. The git commit
comment talks about clustered rabbit nodes and mirrored queues which
doesn't apply to our scenario, but I suspect it would probably fix the
problem that we're seeing as well.
Chris
More information about the OpenStack-dev
mailing list