[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover
Chris Friesen
chris.friesen at windriver.com
Tue Mar 25 03:24:01 UTC 2014
On 03/24/2014 07:45 PM, Chris Behrens wrote:
> Do you have some sort of network device like a firewall between your
> compute and rabbit or you failed from one rabbit over to another?
There are two controllers (active/standby) and two computes all hooked
up to the same switch.
We definitely did a switchover from one controller to the other (which
would take down the rabbit server and then bring it back up again).
> The only cases where I've seen this happen is when the compute side
> OS doesn't detect a closed connection for various reasons.
We may have done a failover, in which case there wouldn't have been a
clean shutdown of the socket. I'd have to check the logs, we've been
testing both scenarios.
> I'm on my
> phone and didn't check your logs, but thought I'd throw it out there.
> If the OS (linux) doesn't know the connection is dead, then obviously
> the user land software will not, either. You can netstat on both
> sides of the connection to see if something is out of whack.
The client does know that the connection is dead, the low-level amqp
code is raising RecoverableConnectionError('connection already closed').
The problem is that the RPC code in Havana doesn't handle connection
error exceptions. The oslo.messaging code used in Icehouse does.
If we ported
"https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8"
to Havana I'd expect that it would catch the exception and reconnect
rather than sit there spinning forever.
Chris
More information about the OpenStack-dev
mailing list