[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover
john at dewey.ws
Tue Mar 25 03:09:46 UTC 2014
Jay had responded to a similar issue  some time ago (I swear I saw talk of this last week but can’t find the newer thread). Since the posting referenced we also found rabbit 3.2.x with esl erlang helped a ton.
tl;dr It is a client issue. See the thread for further details.
On Monday, March 24, 2014 at 10:40 AM, Chris Friesen wrote:
> On 03/24/2014 11:31 AM, Chris Friesen wrote:
> > It looks like we're raising
> > RecoverableConnectionError: connection already closed
> > down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but
> > nothing handles it.
> > It looks like the most likely place that should be handling it is
> > nova.openstack.common.rpc.impl_kombu.Connection.ensure().
> > In the current oslo.messaging code the ensure() routine explicitly
> > handles connection errors (which RecoverableConnectionError is) and
> > socket timeouts--the ensure() routine in Havana doesn't do this.
> I misread the code, ensure() in Havana does in fact monitor socket
> timeouts, but it doesn't handle connection errors.
> It looks like support for handling connection errors was added to
> oslo.messaging just recently in git commit 0400cbf. The git commit
> comment talks about clustered rabbit nodes and mirrored queues which
> doesn't apply to our scenario, but I suspect it would probably fix the
> problem that we're seeing as well.
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org (mailto:OpenStack-dev at lists.openstack.org)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev