[ops] [kolla] RabbitMQ High Availability

Mark Goddard mark at stackhpc.com
Wed Jan 5 09:23:17 UTC 2022


On Tue, 4 Jan 2022 at 14:08, Albert Braden <ozzzo at yahoo.com> wrote:
>
> Now that the holidays are over I'm trying this one again. Can anyone help me figure out how to set "expires" and "message-ttl" ?

John Garbutt proposed a few patches for RabbitMQ in kolla, including
this: https://review.opendev.org/c/openstack/kolla-ansible/+/822191

https://review.opendev.org/q/hashtag:%2522rabbitmq%2522+(status:open+OR+status:merged)+project:openstack/kolla-ansible

Note that they are currently untested.

Mark

> On Thursday, December 16, 2021, 01:43:57 PM EST, Albert Braden <ozzzo at yahoo.com> wrote:
>
>
> I tried these policies in ansible/roles/rabbitmq/templates/definitions.json.j2:
>
> "policies":[
> {"vhost": "/", "name": "ha-all", "pattern": '^(?!(amq\.)|(.*_fanout_)|(reply_)).*', "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
> {"vhost": "/", "name": "notifications-ttl", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"message-ttl":600}, "priority":0}
> {"vhost": "/", "name": "notifications-expire", "pattern": "^(notifications|versioned_notifications)\\.", "apply-to": "queues", "definition": {"expire":3600}, "priority":0}
> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}
> {% endif %}
>
> But I still see unconsumed messages lingering in notifications_extractor.info. From reading the docs I think this setting should cause messages to expire after 600 seconds, and unused queues to be deleted after 3600 seconds. What am I missing?
> On Tuesday, December 14, 2021, 04:18:09 PM EST, Albert Braden <ozzzo at yahoo.com> wrote:
>
>
> Following [1] I successfully set "amqp_durable_queues = True" and restricted HA to the appropriate queues, but I'm having trouble with some of the other settings such as "expires" and "message-ttl". Does anyone have an example of a working kolla config that includes these changes?
>
> [1] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
> On Monday, December 13, 2021, 07:51:32 AM EST, Herve Beraud <hberaud at redhat.com> wrote:
>
>
> So, your config snippet LGTM.
>
> Le ven. 10 déc. 2021 à 17:50, Albert Braden <ozzzo at yahoo.com> a écrit :
>
> Sorry, that was a transcription error. I thought "True" and my fingers typed "False." The correct lines are:
>
> [oslo_messaging_rabbit]
> amqp_durable_queues = True
>
> On Friday, December 10, 2021, 02:55:55 AM EST, Herve Beraud <hberaud at redhat.com> wrote:
>
>
> If you plan to let `amqp_durable_queues = False` (i.e if you plan to keep this config equal to false), then you don't need to add these config lines as this is already the default value [1].
>
> [1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L34
>
> Le jeu. 9 déc. 2021 à 22:40, Albert Braden <ozzzo at yahoo.com> a écrit :
>
> Replying from my home email because I've been asked to not email the list from my work email anymore, until I get permission from upper management.
>
> I'm not sure I follow. I was planning to add 2 lines to etc/kolla/config/global.conf:
>
> [oslo_messaging_rabbit]
> amqp_durable_queues = False
>
> Is that not sufficient? What is involved in configuring dedicated control exchanges for each service? What would that look like in the config?
>
>
> From: Herve Beraud <hberaud at redhat.com>
> Sent: Thursday, December 9, 2021 2:45 AM
> To: Bogdan Dobrelya <bdobreli at redhat.com>
> Cc: openstack-discuss at lists.openstack.org
> Subject: [EXTERNAL] Re: [ops] [kolla] RabbitMQ High Availability
>
>
>
> Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
>
> Le mer. 8 déc. 2021 à 11:48, Bogdan Dobrelya <bdobreli at redhat.com> a écrit :
>
> Please see inline
>
> >> I read this with great interest because we are seeing this issue. Questions:
> >>
> >> 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x?
> >> 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like?
> >> 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?
> >
> > Note that even having rabbit HA policies adjusted like that and its HA
> > replication factor [0] decreased (e.g. to a 2), there still might be
> > high churn caused by a large enough number of replicated durable RPC
> > topic queues. And that might cripple the cloud down with the incurred
> > I/O overhead because a durable queue requires all messages in it to be
> > persisted to a disk (for all the messaging cluster replicas) before they
> > are ack'ed by the broker.
> >
> > Given that said, Oslo messaging would likely require a more granular
> > control for topic exchanges and the durable queues flag - to tell it to
> > declare as durable only the most critical paths of a service. A single
> > config setting and a single control exchange per a service might be not
> > enough.
>
> Also note that therefore, amqp_durable_queue=True requires dedicated
> control exchanges configured for each service. Those that use
> 'openstack' as a default cannot turn the feature ON. Changing it to a
> service specific might also cause upgrade impact, as described in the
> topic [3].
>
>
>
> The same is true for `amqp_auto_delete=True`. That requires dedicated control exchanges else it won't work if each service defines its own policy on a shared control exchange (e.g `openstack`) and if policies differ from each other.
>
>
>
> [3] https://review.opendev.org/q/topic:scope-config-opts
>
> >
> > There are also race conditions with durable queues enabled, like [1]. A
> > solution could be where each service declare its own dedicated control
> > exchange with its own configuration.
> >
> > Finally, openstack components should add perhaps a *.next CI job to test
> > it with durable queues, like [2]
> >
> > [0] https://www.rabbitmq.com/ha.html#replication-factor
> >
> > [1]
> > https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt
> >
> > [2] https://review.opendev.org/c/openstack/nova/+/820523
> >
> >>
> >> Does anyone have a sample set of RMQ config files that they can share?
> >>
> >> It looks like my Outlook has ruined the link; reposting:
> >> [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
> >
> >
> > --
> > Best regards,
> > Bogdan Dobrelya,
> > Irc #bogdando
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
>
>
> --
>
> Hervé Beraud
>
> Senior Software Engineer at Red Hat
>
> irc: hberaud
>
> https://github.com/4383/
>
> https://twitter.com/4383hberaud
>
>
>
> --
> Hervé Beraud
> Senior Software Engineer at Red Hat
> irc: hberaud
> https://github.com/4383/
> https://twitter.com/4383hberaud
>
>
>
> --
> Hervé Beraud
> Senior Software Engineer at Red Hat
> irc: hberaud
> https://github.com/4383/
> https://twitter.com/4383hberaud
>



More information about the openstack-discuss mailing list