Cluster fails when 2 controller nodes become down simultaneously | tripleo wallaby
Harald Jensas
hjensas at redhat.com
Thu Nov 3 15:00:08 UTC 2022
On 11/1/22 11:01, Swogat Pradhan wrote:
> Hi,
> Updating the subject.
>
> On Tue, Nov 1, 2022 at 12:26 PM Swogat Pradhan
> <swogatpradhan22 at gmail.com <mailto:swogatpradhan22 at gmail.com>> wrote:
>
> I have configured a 3 node pcs cluster for openstack.
> To test the HA, i issue the following commands:
> iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT &&
> iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j
> ACCEPT &&
> iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j
> ACCEPT &&
> iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j
> ACCEPT &&
> iptables -A INPUT ! -i lo -j REJECT --reject-with
> icmp-host-prohibited &&
> iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT &&
> iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT &&
> iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT &&
> iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibited
>
> When i issue iptables command on 1 node then it is fenced and forced
> to reboot and cluster works fine.
> But when i issue this on 2 of the controller nodes the resource
> bundles fail and doesn't come back up.
>
This is expected behavior.
In a cluster you need a majority quorum to be able to make the decision
to fence a failing node, and keep services running on the nodes with the
majority quorum.
When you disconnect two nodes from the cluster with firewall rules, none
of the 3 nodes can talk to any other node, i.e they are all isolated
with no knowledge on what is the status on the 2 peer cluster nodes.
Each node can only assume it is the only node that has been isolated,
and the two other nodes are operational. To ensure data integrity any
isolated node should stop it's services immediately.
Imagine if all three nodes, isolated from each-other but still available
to the loadbalancer. Requests would come in and each node would continue
to service requests and write data. Each node servicing ~1/3 of the
requests, the result would be a inconsistent data stores on all three
nodes. A situation that would be practically impossible to recover from.
--
Harald
More information about the openstack-discuss
mailing list