<html><head></head><body>Hi,<br>I definitely recommend upgrading to 3.8.<br>Also enable the durable queues.<br>This helped us a lot in managing our clusters.<br><br>We also applied the policy, which was originally taken from openstack ansible [1].<br><br>We also collect unroutable messages and send alerts based in that (but we had no issue recently thanks to all above).<br><br>Finally we ping all our clients using oslo_ping_endpoint [2] every five minutes so we know when an agent is disconnected.<br><br>Dunno about kolla, sorry.<br><br>[1] <a href="https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172">https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172</a><br><br>[2] <a href="https://opendev.org/openstack/oslo.messaging/commit/82492442f3387a0e4f19623ccfda64f8b84d59c3">https://opendev.org/openstack/oslo.messaging/commit/82492442f3387a0e4f19623ccfda64f8b84d59c3</a><br><br><div class="gmail_quote">Le 1 décembre 2021 19:15:41 GMT+01:00, "Braden, Albert" <abraden@verisign.com> a écrit :<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<pre dir="auto" class="k9mail">I read this with great interest because we are seeing this issue. Questions:<br><br>1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x?<br>2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like?<br>3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?<br><br>Does anyone have a sample set of RMQ config files that they can share?<br><br>It looks like my Outlook has ruined the link; reposting:<br>[2] <a href="https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit">https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit</a><br><br>-----Original Message-----<br>From: Arnaud Morin <arnaud.morin@gmail.com> <br>Sent: Monday, November 29, 2021 2:04 PM<br>To: Bogdan Dobrelya <bdobreli@redhat.com><br>Cc: DHilsbos@performair.com; openstack-discuss@lists.openstack.org<br>Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability<br><br>Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. <br><br>Hi,<br><br>After a talk on this ml (which starts at [1]), we endup building a<br>documentation with Large Scale group.<br>The doc is accessible at [2].<br><br>Hope this will help.<br><br>[1] <a href="http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGFG1umIn0EtkEpBJWko4mi6yOUZ8Vsm-5sDGmIVl8rC2sOHv3Z2I1s9lFIkVFyn16CXJgcJbQQ7SBU8wEz5I_TysLtIY6YrmiC3PkKdG4oVCZk6n_KqYPYjmYUmDn9BD6JcXKUbFujVfugbjewZDY4HDCBnTe43tPSqkIZRVarApPiwsFtHu5PQ5riSoSgTpupqZHZdPnnGz7sbVGzx/http%3A%2F%2Flists.openstack.org%2Fpipermail%2Fopenstack-discuss%2F2020-August%2F016362.html">http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGFG1umIn0EtkEpBJWko4mi6yOUZ8Vsm-5sDGmIVl8rC2sOHv3Z2I1s9lFIkVFyn16CXJgcJbQQ7SBU8wEz5I_TysLtIY6YrmiC3PkKdG4oVCZk6n_KqYPYjmYUmDn9BD6JcXKUbFujVfugbjewZDY4HDCBnTe43tPSqkIZRVarApPiwsFtHu5PQ5riSoSgTpupqZHZdPnnGz7sbVGzx/http%3A%2F%2Flists.openstack.org%2Fpipermail%2Fopenstack-discuss%2F2020-August%2F016362.html</a><br>[2] <a href="https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YPYS-BxRX5omcjCPZ3cMOxefnaO0vc3qlVm_SvI3DpkhejUkQUrrRbBJ72ki_ly13bYzC_QKd0-VERmSnlx8SFUB_DWewMYIZ7JfaURBYN9QvJgwD0b0aG-hYgvxcN1ZCt7qHTDqneGTtpx-5gRUMvld2dFz5uXsPj7QzohumP5bAoTblw7xLJy3zXhlfvrg6aHhQIR4xw9_y8E5Lt7d/https%3A%2F%2Fwiki.openstack.org%2Fwiki%2FLarge_Scale_Configuration_Rabbit">https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YPYS-BxRX5omcjCPZ3cMOxefnaO0vc3qlVm_SvI3DpkhejUkQUrrRbBJ72ki_ly13bYzC_QKd0-VERmSnlx8SFUB_DWewMYIZ7JfaURBYN9QvJgwD0b0aG-hYgvxcN1ZCt7qHTDqneGTtpx-5gRUMvld2dFz5uXsPj7QzohumP5bAoTblw7xLJy3zXhlfvrg6aHhQIR4xw9_y8E5Lt7d/https%3A%2F%2Fwiki.openstack.org%2Fwiki%2FLarge_Scale_Configuration_Rabbit</a><br><br>On 24.11.21 - 11:31, Bogdan Dobrelya wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;">On 11/24/21 12:34 AM, DHilsbos@performair.com wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #ad7fa8; padding-left: 1ex;">All;<br><br>In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided.  I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).<br></blockquote><br>There is no special recommendations for rabbitmq setup for openstack,<br>but probably a few, like instead of putting it behind a haproxy, or the<br>like, list the rabbit cluster nodes in the oslo messaging config<br>settings directly. Also, it seems that durable queues makes a very<br>little sense for highly ephemeral RPC calls, just by design. I would<br>also add that the raft quorum queues feature of rabbitmq >=3.18 does<br>neither fit well into the oslo messaging design for RPC calls.<br><br>A discussable and highly opinionated thing is also  configuring<br>ha/mirror queue policy params for queues used for RPC calls vs<br>broad-casted notifications.<br><br>And my biased personal humble recommendation is: use the upstream OCF RA<br>[0][1], if configuring rabbitmq cluster by pacemaker.<br><br>[0] <a href="https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6URzGYPKa-uhcLHxy3tRvRXopjnLAd2CECD1urJyRpg8NBSxTOEUSPxOlS0cQyULtSQuDbVWr-W7Bl3ZRcdWPrF9EuX_b40IM7zTjqS40gImsEouTqtD1vlCuEoaFgpptDEuMuaNTqBJ0IAtiZHuWiW6E7ufTtgxmVbkGLjXCZw5ZNhibbu-kGVyA-7MQsxQ-RBgSq5peTcLBR2Vx-f9k/https%3A%2F%2Fwww.rabbitmq.com%2Fpacemaker.html%23auto-pacemaker">https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6URzGYPKa-uhcLHxy3tRvRXopjnLAd2CECD1urJyRpg8NBSxTOEUSPxOlS0cQyULtSQuDbVWr-W7Bl3ZRcdWPrF9EuX_b40IM7zTjqS40gImsEouTqtD1vlCuEoaFgpptDEuMuaNTqBJ0IAtiZHuWiW6E7ufTtgxmVbkGLjXCZw5ZNhibbu-kGVyA-7MQsxQ-RBgSq5peTcLBR2Vx-f9k/https%3A%2F%2Fwww.rabbitmq.com%2Fpacemaker.html%23auto-pacemaker</a><br><br>[1]<br><a href="https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn5r0g_GH8PB8wZC5-JmHW2-m1YWoj1Z86jFcmWT0m9W9Sax5fJE5G7MbvQN2JM0EbAVHJDCmiBkMZlrSLoTgmh30RGhvmF9ww7jAjVnas3_AYFmwc65P-YtpdcswFC8rYcg5HlE2d979gf2OQUeftP3lfClkVou7hnELIFanDq07MfOJc2exHIfBo2ZQyUXRqXWUqnTsj7df-jCySkz/https%3A%2F%2Fgithub.com%2FClusterLabs%2Fresource-agents%2Fblob%2Fmaster%2Fheartbeat%2Frabbitmq-server-ha">https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn5r0g_GH8PB8wZC5-JmHW2-m1YWoj1Z86jFcmWT0m9W9Sax5fJE5G7MbvQN2JM0EbAVHJDCmiBkMZlrSLoTgmh30RGhvmF9ww7jAjVnas3_AYFmwc65P-YtpdcswFC8rYcg5HlE2d979gf2OQUeftP3lfClkVou7hnELIFanDq07MfOJc2exHIfBo2ZQyUXRqXWUqnTsj7df-jCySkz/https%3A%2F%2Fgithub.com%2FClusterLabs%2Fresource-agents%2Fblob%2Fmaster%2Fheartbeat%2Frabbitmq-server-ha</a><br><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #ad7fa8; padding-left: 1ex;"><br>Unfortunately, I can't find the previous threads on this topic.<br><br>Does anyone have this information, that they would care to share with me?<br><br>Thank you,<br><br>Dominic L. Hilsbos, MBA<br>Vice President - Information Technology<br>Perform Air International Inc.<br>DHilsbos@PerformAir.com<br>www.PerformAir.com<br><br><br><br></blockquote><br><br><div class="k9mail-signature">-- <br>Best regards,<br>Bogdan Dobrelya,<br>Irc #bogdando<br><br><br></div></blockquote><br><br><br></pre></blockquote></div></body></html>