I have configured a 3 node pcs cluster for openstack.
To test the HA, i issue the following commands:
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT &&
iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT &&
iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-prohibited &&
iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT &&
iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibited

When i issue iptables command on 1 node then it is fenced and forced to reboot and cluster works fine.
But when i issue this on 2 of the controller nodes the resource bundles fail and doesn't come back up.

[root@overcloud-controller-1 ~]# pcs status
Cluster name: tripleo_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: overcloud-controller-1 (version 2.1.2-4.el8-ada5c3b36e2) - partition WITHOUT quorum
  * Last updated: Sat Oct 29 03:15:29 2022
  * Last change: Sat Oct 29 03:12:26 2022 by root via crm_resource on overcloud-controller-1
  * 19 nodes configured
  * 68 resource instances configured

Node List:
  * Node overcloud-controller-0: UNCLEAN (offline)
  * Node overcloud-controller-2: UNCLEAN (offline)
  * Online: [ overcloud-controller-1 ]

Full List of Resources:
  * ip-172.25.201.91 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
  * ip-172.25.201.150 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
  * ip-172.25.201.206 (ocf::heartbeat:IPaddr2): Stopped
  * ip-172.25.201.250 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
  * ip-172.25.202.50 (ocf::heartbeat:IPaddr2): Stopped
  * ip-172.25.202.90 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
  * Container bundle set: haproxy-bundle [172.25.201.68:8787/tripleomaster/openstack-haproxy:pcmklatest]:
    * haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
    * haproxy-bundle-podman-1 (ocf::heartbeat:podman): Stopped
    * haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started overcloud-controller-2 (UNCLEAN)
    * haproxy-bundle-podman-3 (ocf::heartbeat:podman): Stopped
  * Container bundle set: galera-bundle [172.25.201.68:8787/tripleomaster/openstack-mariadb:pcmklatest]:
    * galera-bundle-0 (ocf::heartbeat:galera): Stopped overcloud-controller-0 (UNCLEAN)
    * galera-bundle-1 (ocf::heartbeat:galera): Stopped
    * galera-bundle-2 (ocf::heartbeat:galera): Stopped overcloud-controller-2 (UNCLEAN)
    * galera-bundle-3 (ocf::heartbeat:galera): Stopped
  * Container bundle set: redis-bundle [172.25.201.68:8787/tripleomaster/openstack-redis:pcmklatest]:
    * redis-bundle-0 (ocf::heartbeat:redis): Stopped
    * redis-bundle-1 (ocf::heartbeat:redis): Stopped overcloud-controller-2 (UNCLEAN)
    * redis-bundle-2 (ocf::heartbeat:redis): Stopped overcloud-controller-0 (UNCLEAN)
    * redis-bundle-3 (ocf::heartbeat:redis): Stopped
  * Container bundle set: ovn-dbs-bundle [172.25.201.68:8787/tripleomaster/openstack-ovn-northd:pcmklatest]:
    * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped overcloud-controller-2 (UNCLEAN)
    * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped overcloud-controller-0 (UNCLEAN)
    * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Stopped
    * ovn-dbs-bundle-3 (ocf::ovn:ovndb-servers): Stopped
  * ip-172.25.201.208 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 (UNCLEAN)
  * Container bundle: openstack-cinder-backup [172.25.201.68:8787/tripleomaster/openstack-cinder-backup:pcmklatest]:
    * openstack-cinder-backup-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
  * Container bundle: openstack-cinder-volume [172.25.201.68:8787/tripleomaster/openstack-cinder-volume:pcmklatest]:
    * openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Stopped
  * Container bundle set: rabbitmq-bundle [172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-2 (UNCLEAN)
    * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-0 (UNCLEAN)
    * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped
    * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Stopped
  * ip-172.25.204.250 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 (UNCLEAN)
  * ceph-nfs (systemd:ceph-nfs@pacemaker): Started overcloud-controller-0 (UNCLEAN)
  * Container bundle: openstack-manila-share [172.25.201.68:8787/tripleomaster/openstack-manila-share:pcmklatest]:
    * openstack-manila-share-podman-0 (ocf::heartbeat:podman): Started overcloud-controller-0 (UNCLEAN)
  * stonith-fence_ipmilan-48d539a11820 (stonith:fence_ipmilan): Stopped
  * stonith-fence_ipmilan-48d539a1188c (stonith:fence_ipmilan): Started overcloud-controller-2 (UNCLEAN)
  * stonith-fence_ipmilan-246e96349068 (stonith:fence_ipmilan): Started overcloud-controller-2 (UNCLEAN)
  * stonith-fence_ipmilan-246e96348d30 (stonith:fence_ipmilan): Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

PCS requires more than half the nodes to be alive for the cluster to work. To fix this step I issued a command:pcs no-quorum-policy=ignore.

And now the PCS cluster keeps on running even when there is no quorum.

Now the issue i have is the mariadb-bundle becomes slave and dosen't get promoted to master.

Can you please suggest a proper workaround when more than half nodes go down and my cloud will be still running.


With regards,

Swogat Pradhan