I have configured a 3 node pcs cluster for openstack.
To test the HA, i issue the following commands:
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT &&
iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT &&
iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-prohibited &&
iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT &&
iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibited
When i issue iptables command on 1 node then it is fenced and forced to reboot and cluster works fine.
But when i issue this on 2 of the controller nodes the resource bundles fail and doesn't come back up.
[root@overcloud-controller-1 ~]# pcs status
Cluster name: tripleo_cluster
Cluster Summary:
* Stack: corosync
* Current DC: overcloud-controller-1 (version 2.1.2-4.el8-ada5c3b36e2) - partition WITHOUT quorum
* Last updated: Sat Oct 29 03:15:29 2022
* Last change: Sat Oct 29 03:12:26 2022 by root via crm_resource on overcloud-controller-1
* 19 nodes configured
* 68 resource instances configured
Node List:
* Node overcloud-controller-0: UNCLEAN (offline)
* Node overcloud-controller-2: UNCLEAN (offline)
* Online: [ overcloud-controller-1 ]
Full List of Resources:
* ip-172.25.201.91 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
* ip-172.25.201.150 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
* ip-172.25.201.206 (ocf::heartbeat:IPaddr2): Stopped
* ip-172.25.201.250 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
* ip-172.25.202.50 (ocf::heartbeat:IPaddr2): Stopped
* ip-172.25.202.90 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
* Container bundle set: haproxy-bundle [172.25.201.68:8787/tripleomaster/openstack-haproxy:pcmklatest]:
* haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
* haproxy-bundle-podman-1 (ocf::heartbeat:podman): Stopped
* haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started overcloud-controller-2 (UNCLEAN)
* haproxy-bundle-podman-3 (ocf::heartbeat:podman): Stopped
* Container bundle set: galera-bundle [172.25.201.68:8787/tripleomaster/openstack-mariadb:pcmklatest]:
* galera-bundle-0 (ocf::heartbeat:galera): Stopped overcloud-controller-0 (UNCLEAN)
* galera-bundle-1 (ocf::heartbeat:galera): Stopped
* galera-bundle-2 (ocf::heartbeat:galera): Stopped overcloud-controller-2 (UNCLEAN)
* galera-bundle-3 (ocf::heartbeat:galera): Stopped
* Container bundle set: redis-bundle [172.25.201.68:8787/tripleomaster/openstack-redis:pcmklatest]:
* redis-bundle-0 (ocf::heartbeat:redis): Stopped
* redis-bundle-1 (ocf::heartbeat:redis): Stopped overcloud-controller-2 (UNCLEAN)
* redis-bundle-2 (ocf::heartbeat:redis): Stopped overcloud-controller-0 (UNCLEAN)
* redis-bundle-3 (ocf::heartbeat:redis): Stopped
* Container bundle set: ovn-dbs-bundle [172.25.201.68:8787/tripleomaster/openstack-ovn-northd:pcmklatest]:
* ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped overcloud-controller-2 (UNCLEAN)
* ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped overcloud-controller-0 (UNCLEAN)
* ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Stopped
* ovn-dbs-bundle-3 (ocf::ovn:ovndb-servers): Stopped
* ip-172.25.201.208 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
* Container bundle: openstack-cinder-backup [172.25.201.68:8787/tripleomaster/openstack-cinder-backup:pcmklatest]:
* openstack-cinder-backup-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
* Container bundle: openstack-cinder-volume [172.25.201.68:8787/tripleomaster/openstack-cinder-volume:pcmklatest]:
* openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Stopped
* Container bundle set: rabbitmq-bundle [172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
* rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-2 (UNCLEAN)
* rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-0 (UNCLEAN)
* rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped
* rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Stopped
* ip-172.25.204.250 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
* ceph-nfs (systemd:ceph-nfs@pacemaker): Started overcloud-controller-0 (UNCLEAN)
* Container bundle: openstack-manila-share [172.25.201.68:8787/tripleomaster/openstack-manila-share:pcmklatest]:
* openstack-manila-share-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
* stonith-fence_ipmilan-48d539a11820 (stonith:fence_ipmilan): Stopped
* stonith-fence_ipmilan-48d539a1188c (stonith:fence_ipmilan): Started overcloud-controller-2 (UNCLEAN)
* stonith-fence_ipmilan-246e96349068 (stonith:fence_ipmilan): Started overcloud-controller-2 (UNCLEAN)
* stonith-fence_ipmilan-246e96348d30 (stonith:fence_ipmilan): Stopped
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
PCS requires more than half the nodes to be alive for the cluster to work. To fix this step I issued a command:pcs no-quorum-policy=ignore.
And now the PCS cluster keeps on running even when there is no quorum.
Now the issue i have is the mariadb-bundle becomes slave and dosen't get promoted to master.
Can you please suggest a proper workaround when more than half nodes go down and my cloud will be still running.
With regards,
Swogat Pradhan