[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover
cbehrens at codestud.com
Tue Mar 25 01:45:26 UTC 2014
Do you have some sort of network device like a firewall between your compute and rabbit or you failed from one rabbit over to another? The only cases where I've seen this happen is when the compute side OS doesn't detect a closed connection for various reasons. I'm on my phone and didn't check your logs, but thought I'd throw it out there. If the OS (linux) doesn't know the connection is dead, then obviously the user land software will not, either. You can netstat on both sides of the connection to see if something is out of whack.
> On Mar 24, 2014, at 10:40 AM, Chris Friesen <chris.friesen at windriver.com> wrote:
>> On 03/24/2014 11:31 AM, Chris Friesen wrote:
>> It looks like we're raising
>> RecoverableConnectionError: connection already closed
>> down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but
>> nothing handles it.
>> It looks like the most likely place that should be handling it is
>> In the current oslo.messaging code the ensure() routine explicitly
>> handles connection errors (which RecoverableConnectionError is) and
>> socket timeouts--the ensure() routine in Havana doesn't do this.
> I misread the code, ensure() in Havana does in fact monitor socket timeouts, but it doesn't handle connection errors.
> It looks like support for handling connection errors was added to oslo.messaging just recently in git commit 0400cbf. The git commit comment talks about clustered rabbit nodes and mirrored queues which doesn't apply to our scenario, but I suspect it would probably fix the problem that we're seeing as well.
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev