[ops] [kolla] RabbitMQ High Availability

Bogdan Dobrelya bdobreli at redhat.com
Mon Dec 6 14:18:47 UTC 2021


> I read this with great interest because we are seeing this issue. Questions:
> 
> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x?
> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like?
> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?

Note that even having rabbit HA policies adjusted like that and its HA
replication factor [0] decreased (e.g. to a 2), there still might be
high churn caused by a large enough number of replicated durable RPC
topic queues. And that might cripple the cloud down with the incurred
I/O overhead because a durable queue requires all messages in it to be
persisted to a disk (for all the messaging cluster replicas) before they
are ack'ed by the broker.

Given that said, Oslo messaging would likely require a more granular
control for topic exchanges and the durable queues flag - to tell it to
declare as durable only the most critical paths of a service. A single
config setting and a single control exchange per a service might be not
enough.

There are also race conditions with durable queues enabled, like [1]. A
solution could be where each service declare its own dedicated control
exchange with its own configuration.

Finally, openstack components should add perhaps a *.next CI job to test
it with durable queues, like [2]

[0] https://www.rabbitmq.com/ha.html#replication-factor

[1]
https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt

[2] https://review.opendev.org/c/openstack/nova/+/820523

> 
> Does anyone have a sample set of RMQ config files that they can share?
> 
> It looks like my Outlook has ruined the link; reposting:
> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando




More information about the openstack-discuss mailing list