Open Stack

Tue Mar 25 03:24:01 UTC 2014

On 03/24/2014 07:45 PM, Chris Behrens wrote:
> Do you have some sort of network device like a firewall between your
> compute and rabbit or you failed from one rabbit over to another?

There are two controllers (active/standby) and two computes all hooked 
up to the same switch.

We definitely did a switchover from one controller to the other (which 
would take down the rabbit server and then bring it back up again).

> The only cases where I've seen this happen is when the compute side
> OS doesn't detect a closed connection for various reasons.

We may have done a failover, in which case there wouldn't have been a 
clean shutdown of the socket.  I'd have to check the logs, we've been 
testing both scenarios.

 > I'm on my
> phone and didn't check your logs, but thought I'd throw it out there.
> If the OS (linux) doesn't know the connection is dead, then obviously
> the user land software will not, either.  You can netstat on both
> sides of the connection to see if something is out of whack.

The client does know that the connection is dead, the low-level amqp 
code is raising RecoverableConnectionError('connection already closed').

The problem is that the RPC code in Havana doesn't handle connection 
error exceptions.  The oslo.messaging code used in Icehouse does.

If we ported 
"https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8" 
to Havana I'd expect that it would catch the exception and reconnect 
rather than sit there spinning forever.

Chris

Open Stack

[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

OpenStack

Community

Documentation

Branding & Legal