[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

Chris Friesen chris.friesen at windriver.com
Mon Mar 24 17:40:33 UTC 2014


On 03/24/2014 11:31 AM, Chris Friesen wrote:

> It looks like we're raising
>
> RecoverableConnectionError: connection already closed
>
> down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but
> nothing handles it.
>
> It looks like the most likely place that should be handling it is
> nova.openstack.common.rpc.impl_kombu.Connection.ensure().
>
>
> In the current oslo.messaging code the ensure() routine explicitly
> handles connection errors (which RecoverableConnectionError is) and
> socket timeouts--the ensure() routine in Havana doesn't do this.

I misread the code, ensure() in Havana does in fact monitor socket 
timeouts, but it doesn't handle connection errors.

It looks like support for handling connection errors was added to 
oslo.messaging just recently in git commit 0400cbf.  The git commit 
comment talks about clustered rabbit nodes and mirrored queues which 
doesn't apply to our scenario, but I suspect it would probably fix the 
problem that we're seeing as well.

Chris



More information about the OpenStack-dev mailing list