On 21/07/2022 11:32, Tan Tran Trong wrote:
Hello, I'm trying to figure out how to configure RabbitMQ to make it high available. I have 3 controller nodes and 2 compute nodes, deployed with kolla with mostly default configuration. The RabbitMQ set to ha-all for all queues on all nodes, amqp_durable_queues = True My problem is when I shutdown 1 controller node (or 1 RabbitMQ container) (master or slave) the whole cluster becomes unstable. Some instances can not be created, it is stuck on Scheduling, Block Device Mapping, the volumes not shown or are stuck on creating, the compute node reported dead randomly,... I'm looking for documentation to know how Openstack using RabbitMQ, Openstack behavior when RabbitMQ node down and way to make RabbitMQ HA in a stable way. Do you have any recommendation?
Would it be possible to compare with this approach of running a clustered Rabbit service, but without mirrored (and durable) queues? https://review.opendev.org/c/openstack/kolla-ansible/+/824994 It won't solve all failure scenarios, but we have seen it help with controlled shutdowns. We'd be interested in any failure scenarios you find with those settings.
TIA, Tan