[ops] [kolla] RabbitMQ High Availability

Herve Beraud hberaud at redhat.com
Thu Dec 9 07:45:10 UTC 2021


Le mer. 8 déc. 2021 à 11:48, Bogdan Dobrelya <bdobreli at redhat.com> a écrit :

> Please see inline
>
> >> I read this with great interest because we are seeing this issue.
> Questions:
> >>
> >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23.
> Should we be upgrading our Train clusters to use 3.8.x?
> >> 2. Document [2] recommends policy
> '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible
> playbooks, nor in any of the config files in the RMQ container. What would
> this look like in Ansible, and what should the resulting container config
> look like?
> >> 3. It appears that we are not setting "amqp_durable_queues = True".
> What does this setting look like in Ansible, and what file does it go into?
> >
> > Note that even having rabbit HA policies adjusted like that and its HA
> > replication factor [0] decreased (e.g. to a 2), there still might be
> > high churn caused by a large enough number of replicated durable RPC
> > topic queues. And that might cripple the cloud down with the incurred
> > I/O overhead because a durable queue requires all messages in it to be
> > persisted to a disk (for all the messaging cluster replicas) before they
> > are ack'ed by the broker.
> >
> > Given that said, Oslo messaging would likely require a more granular
> > control for topic exchanges and the durable queues flag - to tell it to
> > declare as durable only the most critical paths of a service. A single
> > config setting and a single control exchange per a service might be not
> > enough.
>
> Also note that therefore, amqp_durable_queue=True requires dedicated
> control exchanges configured for each service. Those that use
> 'openstack' as a default cannot turn the feature ON. Changing it to a
> service specific might also cause upgrade impact, as described in the
> topic [3].
>
>
The same is true for `amqp_auto_delete=True`. That requires dedicated
control exchanges else it won't work if each service defines its own policy
on a shared control exchange (e.g `openstack`) and if policies differ from
each other.


> [3] https://review.opendev.org/q/topic:scope-config-opts
>
> >
> > There are also race conditions with durable queues enabled, like [1]. A
> > solution could be where each service declare its own dedicated control
> > exchange with its own configuration.
> >
> > Finally, openstack components should add perhaps a *.next CI job to test
> > it with durable queues, like [2]
> >
> > [0] https://www.rabbitmq.com/ha.html#replication-factor
> >
> > [1]
> >
> https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt
> >
> > [2] https://review.opendev.org/c/openstack/nova/+/820523
> >
> >>
> >> Does anyone have a sample set of RMQ config files that they can share?
> >>
> >> It looks like my Outlook has ruined the link; reposting:
> >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
> >
> >
> > --
> > Best regards,
> > Bogdan Dobrelya,
> > Irc #bogdando
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
>

-- 
Hervé Beraud
Senior Software Engineer at Red Hat
irc: hberaud
https://github.com/4383/
https://twitter.com/4383hberaud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211209/a52e0a07/attachment.htm>


More information about the openstack-discuss mailing list