Struggling to rejion a node into rabbitmq cluster of Openstack-on-Ansible environment
Hi Team, We are operating an OpenStack environment managed by Ansible with 3 controller nodes. Openstack Version: Ussuri(21.2.1) OS: CentOS8 Recently, Controller-2 encountered a network partition with rabbitmq cluster issue that we couldn't resolve, leading us to remove it from the rabbitmq cluster and do rabbitmqctl reset. Since then, the RabbitMQ cluster node removal, We are currently unable to rejoin this node to the RabbitMQ cluster. The following error occurs when attempting to join the cluster: [Controller-2]# rabbitmqctl join_cluster rabbit@ctrl-1-rabbit-mq-container Clustering node rabbit@ctrl-2 with rabbit@ctrl-1 Error: {:failed_to_cluster_with, [:"rabbit@ctrl-3", :"rabbit@ctrl-1"], 'Mnesia could not connect to any nodes.'} We have verified that Controller-2 can communicate with other nodes in the cluster and it responds with a 'pong' for "rabbitmqctl eval 'net_adm:ping('\''rrabbit@ctrl-1'\'').'" There are no additional error logs indicating the root of the problem. We would appreciate any assistance in resolving this issue. Regards, Sudeb Ghosh 7044064878 9332034788
I think you can try doing multiple things: 1. Fully re-create rabbitmq container (given it's containerized deployment. For that you can execute: # openstack-ansible playbooks/lxc-containers-destroy.yml --limit ctrl-2-rabbit-mq-container # openstack-ansible playbooks/lxc-containers-create.yml --limit ctrl-2-rabbit-mq-container,ctrl-2 # openstack-ansible playbooks/rabbitmq-install.yml -e rabbitmq_upgrade=true 2. You can try removing the mnesia database and try to re-join the cluster manually again. Mnesia is usually located under /var/lib/rabbitmq/mnesia/ Hope this helps. пт, 6 сент. 2024 г. в 14:04, sudeb ghosh <sudeb_ece@yahoo.co.in>:
Hi Team,
We are operating an OpenStack environment managed by Ansible with 3 controller nodes. Openstack Version: Ussuri(21.2.1) OS: CentOS8 Recently, Controller-2 encountered a network partition with rabbitmq cluster issue that we couldn't resolve, leading us to remove it from the rabbitmq cluster and do rabbitmqctl reset. Since then, the RabbitMQ cluster node removal, We are currently unable to rejoin this node to the RabbitMQ cluster.
The following error occurs when attempting to join the cluster: [Controller-2]# rabbitmqctl join_cluster rabbit@ctrl-1-rabbit-mq-container Clustering node rabbit@ctrl-2 with rabbit@ctrl-1 Error: {:failed_to_cluster_with, [:"rabbit@ctrl-3", :"rabbit@ctrl-1"], 'Mnesia could not connect to any nodes.'}
We have verified that Controller-2 can communicate with other nodes in the cluster and it responds with a 'pong' for "rabbitmqctl eval 'net_adm:ping('\''rrabbit@ctrl-1'\'').'"
There are no additional error logs indicating the root of the problem.
We would appreciate any assistance in resolving this issue.
Regards, Sudeb Ghosh 7044064878 9332034788
participants (2)
-
Dmitriy Rabotyagov
-
sudeb ghosh