[openstack-dev] problems with rabbitmq on HA controller failure...anyone seen this?

Chris Friesen chris.friesen at windriver.com
Fri Nov 29 20:22:17 UTC 2013


Hi,

We're currently running Grizzly (going to Havana soon) and we're running 
into an issue where if the active controller is ungracefully killed then 
nova-compute on the compute node doesn't properly connect to the new 
rabbitmq server on the newly-active controller node.

I saw a bugfix in Folsom (https://bugs.launchpad.net/nova/+bug/718869) 
to retry the connection to rabbitmq if it's lost, but it doesn't seem to 
be properly handling this case.

Interestingly, killing and restarting nova-compute on the compute node 
seems to work, which implies that the retry code is doing something less 
effective than the initial startup.

Has anyone doing HA controller setups run into something similar?

Chris



More information about the OpenStack-dev mailing list