[ops] [kolla] RabbitMQ High Availability

Arnaud arnaud.morin at gmail.com
Wed Dec 1 20:02:02 UTC 2021


Hi,
I definitely recommend upgrading to 3.8.
Also enable the durable queues.
This helped us a lot in managing our clusters.

We also applied the policy, which was originally taken from openstack ansible [1].

We also collect unroutable messages and send alerts based in that (but we had no issue recently thanks to all above).

Finally we ping all our clients using oslo_ping_endpoint [2] every five minutes so we know when an agent is disconnected.

Dunno about kolla, sorry.

[1] https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172

[2] https://opendev.org/openstack/oslo.messaging/commit/82492442f3387a0e4f19623ccfda64f8b84d59c3

Le 1 décembre 2021 19:15:41 GMT+01:00, "Braden, Albert" <abraden at verisign.com> a écrit :
>I read this with great interest because we are seeing this issue. Questions:
>
>1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x?
>2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like?
>3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?
>
>Does anyone have a sample set of RMQ config files that they can share?
>
>It looks like my Outlook has ruined the link; reposting:
>[2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
>
>-----Original Message-----
>From: Arnaud Morin <arnaud.morin at gmail.com> 
>Sent: Monday, November 29, 2021 2:04 PM
>To: Bogdan Dobrelya <bdobreli at redhat.com>
>Cc: DHilsbos at performair.com; openstack-discuss at lists.openstack.org
>Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability
>
>Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. 
>
>Hi,
>
>After a talk on this ml (which starts at [1]), we endup building a
>documentation with Large Scale group.
>The doc is accessible at [2].
>
>Hope this will help.
>
>[1] http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGFG1umIn0EtkEpBJWko4mi6yOUZ8Vsm-5sDGmIVl8rC2sOHv3Z2I1s9lFIkVFyn16CXJgcJbQQ7SBU8wEz5I_TysLtIY6YrmiC3PkKdG4oVCZk6n_KqYPYjmYUmDn9BD6JcXKUbFujVfugbjewZDY4HDCBnTe43tPSqkIZRVarApPiwsFtHu5PQ5riSoSgTpupqZHZdPnnGz7sbVGzx/http%3A%2F%2Flists.openstack.org%2Fpipermail%2Fopenstack-discuss%2F2020-August%2F016362.html
>[2] https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YPYS-BxRX5omcjCPZ3cMOxefnaO0vc3qlVm_SvI3DpkhejUkQUrrRbBJ72ki_ly13bYzC_QKd0-VERmSnlx8SFUB_DWewMYIZ7JfaURBYN9QvJgwD0b0aG-hYgvxcN1ZCt7qHTDqneGTtpx-5gRUMvld2dFz5uXsPj7QzohumP5bAoTblw7xLJy3zXhlfvrg6aHhQIR4xw9_y8E5Lt7d/https%3A%2F%2Fwiki.openstack.org%2Fwiki%2FLarge_Scale_Configuration_Rabbit
>
>On 24.11.21 - 11:31, Bogdan Dobrelya wrote:
>> On 11/24/21 12:34 AM, DHilsbos at performair.com wrote:
>> > All;
>> > 
>> > In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided.  I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).
>> 
>> There is no special recommendations for rabbitmq setup for openstack,
>> but probably a few, like instead of putting it behind a haproxy, or the
>> like, list the rabbit cluster nodes in the oslo messaging config
>> settings directly. Also, it seems that durable queues makes a very
>> little sense for highly ephemeral RPC calls, just by design. I would
>> also add that the raft quorum queues feature of rabbitmq >=3.18 does
>> neither fit well into the oslo messaging design for RPC calls.
>> 
>> A discussable and highly opinionated thing is also  configuring
>> ha/mirror queue policy params for queues used for RPC calls vs
>> broad-casted notifications.
>> 
>> And my biased personal humble recommendation is: use the upstream OCF RA
>> [0][1], if configuring rabbitmq cluster by pacemaker.
>> 
>> [0] https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6URzGYPKa-uhcLHxy3tRvRXopjnLAd2CECD1urJyRpg8NBSxTOEUSPxOlS0cQyULtSQuDbVWr-W7Bl3ZRcdWPrF9EuX_b40IM7zTjqS40gImsEouTqtD1vlCuEoaFgpptDEuMuaNTqBJ0IAtiZHuWiW6E7ufTtgxmVbkGLjXCZw5ZNhibbu-kGVyA-7MQsxQ-RBgSq5peTcLBR2Vx-f9k/https%3A%2F%2Fwww.rabbitmq.com%2Fpacemaker.html%23auto-pacemaker
>> 
>> [1]
>> https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn5r0g_GH8PB8wZC5-JmHW2-m1YWoj1Z86jFcmWT0m9W9Sax5fJE5G7MbvQN2JM0EbAVHJDCmiBkMZlrSLoTgmh30RGhvmF9ww7jAjVnas3_AYFmwc65P-YtpdcswFC8rYcg5HlE2d979gf2OQUeftP3lfClkVou7hnELIFanDq07MfOJc2exHIfBo2ZQyUXRqXWUqnTsj7df-jCySkz/https%3A%2F%2Fgithub.com%2FClusterLabs%2Fresource-agents%2Fblob%2Fmaster%2Fheartbeat%2Frabbitmq-server-ha
>> 
>> > 
>> > Unfortunately, I can't find the previous threads on this topic.
>> > 
>> > Does anyone have this information, that they would care to share with me?
>> > 
>> > Thank you,
>> > 
>> > Dominic L. Hilsbos, MBA
>> > Vice President - Information Technology
>> > Perform Air International Inc.
>> > DHilsbos at PerformAir.com
>> > www.PerformAir.com
>> > 
>> > 
>> > 
>> 
>> 
>> -- 
>> Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>> 
>> 
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211201/afc9d422/attachment.htm>


More information about the openstack-discuss mailing list