[Openstack-operators] Active/passive nova-network failover results in both controllers APRing for gateway addresses
mismith at overstock.com
Wed Oct 29 04:34:24 UTC 2014
I’ve been running nova-network in VLAN mode as an active/passive cluster resource (corosync + rgmanager) on my OpenStack Havana and Folsom controller pairs for a good long while. This week I found an oddity that I hadn’t noticed before, and I’d like to ask the community about it.
When nova-network starts up, it of course launches a dnsmasq process for each network, which listens on the .1 address of the assigned network and acts as the gateway for that network. When the nova-network service is moved to the passive node, nova-network starts up dnsmasq processes on that node as well, again listening on the .1 addresses. However, since now both nodes have the .1 addresses configured, they basically take turns ARPing for the addresses and stealing the traffic from each other. VMs will route through the “active” node for a minute or so and then suddenly start routing through the “passive” node. Then the cycle repeats. Among other things, this results in only one controller at a time being able to reach the VMs and adds latency to VM traffic when the shift happens.
To stop this, I had to manually remove the VLAN interfaces from the bridges, bring down the bridges, then delete the bridges from the now-passive node. Things then returned to normal, with all traffic flowing through the “active” controller and both controllers being able to reach the VMs.
I have not seen anything in the HA guides about how people are preventing this situation from occuring - nothing about killing off dnsmasq or tearing down these network interfaces to prevent the ARP wars. Anybody else out there experienced this? How are people handling the situation?
I am considering bringing up arptables to block ARP for the gateway addresses when cluster failover happens, or alternatively automating the tear-down of these gateway addresses. Am I missing something here?
Principal Engineer, Website Systems
CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
More information about the OpenStack-operators