RE: [ops] [kolla] RabbitMQ High Availability
I read this with great interest because we are seeing this issue. Questions: 1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into? Does anyone have a sample set of RMQ config files that they can share? It looks like my Outlook has ruined the link; reposting: [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit -----Original Message----- From: Arnaud Morin <arnaud.morin@gmail.com> Sent: Monday, November 29, 2021 2:04 PM To: Bogdan Dobrelya <bdobreli@redhat.com> Cc: DHilsbos@performair.com; openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi, After a talk on this ml (which starts at [1]), we endup building a documentation with Large Scale group. The doc is accessible at [2]. Hope this will help. [1] http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGF... [2] https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YP... On 24.11.21 - 11:31, Bogdan Dobrelya wrote:
On 11/24/21 12:34 AM, DHilsbos@performair.com wrote:
All;
In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided. I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).
There is no special recommendations for rabbitmq setup for openstack, but probably a few, like instead of putting it behind a haproxy, or the like, list the rabbit cluster nodes in the oslo messaging config settings directly. Also, it seems that durable queues makes a very little sense for highly ephemeral RPC calls, just by design. I would also add that the raft quorum queues feature of rabbitmq >=3.18 does neither fit well into the oslo messaging design for RPC calls.
A discussable and highly opinionated thing is also configuring ha/mirror queue policy params for queues used for RPC calls vs broad-casted notifications.
And my biased personal humble recommendation is: use the upstream OCF RA [0][1], if configuring rabbitmq cluster by pacemaker.
[0] https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6UR...
[1] https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn...
Unfortunately, I can't find the previous threads on this topic.
Does anyone have this information, that they would care to share with me?
Thank you,
Dominic L. Hilsbos, MBA Vice President - Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-- Best regards, Bogdan Dobrelya, Irc #bogdando
Hi, I definitely recommend upgrading to 3.8. Also enable the durable queues. This helped us a lot in managing our clusters. We also applied the policy, which was originally taken from openstack ansible [1]. We also collect unroutable messages and send alerts based in that (but we had no issue recently thanks to all above). Finally we ping all our clients using oslo_ping_endpoint [2] every five minutes so we know when an agent is disconnected. Dunno about kolla, sorry. [1] https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735... [2] https://opendev.org/openstack/oslo.messaging/commit/82492442f3387a0e4f19623c... Le 1 décembre 2021 19:15:41 GMT+01:00, "Braden, Albert" <abraden@verisign.com> a écrit :
I read this with great interest because we are seeing this issue. Questions:
1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?
Does anyone have a sample set of RMQ config files that they can share?
It looks like my Outlook has ruined the link; reposting: [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
-----Original Message----- From: Arnaud Morin <arnaud.morin@gmail.com> Sent: Monday, November 29, 2021 2:04 PM To: Bogdan Dobrelya <bdobreli@redhat.com> Cc: DHilsbos@performair.com; openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
After a talk on this ml (which starts at [1]), we endup building a documentation with Large Scale group. The doc is accessible at [2].
Hope this will help.
[1] http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGF... [2] https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YP...
On 24.11.21 - 11:31, Bogdan Dobrelya wrote:
On 11/24/21 12:34 AM, DHilsbos@performair.com wrote:
All;
In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided. I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).
There is no special recommendations for rabbitmq setup for openstack, but probably a few, like instead of putting it behind a haproxy, or the like, list the rabbit cluster nodes in the oslo messaging config settings directly. Also, it seems that durable queues makes a very little sense for highly ephemeral RPC calls, just by design. I would also add that the raft quorum queues feature of rabbitmq >=3.18 does neither fit well into the oslo messaging design for RPC calls.
A discussable and highly opinionated thing is also configuring ha/mirror queue policy params for queues used for RPC calls vs broad-casted notifications.
And my biased personal humble recommendation is: use the upstream OCF RA [0][1], if configuring rabbitmq cluster by pacemaker.
[0] https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6UR...
[1] https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn...
Unfortunately, I can't find the previous threads on this topic.
Does anyone have this information, that they would care to share with me?
Thank you,
Dominic L. Hilsbos, MBA Vice President - Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-- Best regards, Bogdan Dobrelya, Irc #bogdando
Hello Albert, The amqp_durable_queues configuration option comes from oslo.messaging, which is a library used by many OpenStack projects. You can set this option in the [oslo_messaging_rabbit] section of each of these OpenStack projects (check their configuration reference for more details). As for how to configure it with Kolla Ansible: you can either set it directly for each service (e.g. in /etc/kolla/config/nova.conf to configure it for Nova) or for all projects at once using /etc/kolla/config/global.conf. Projects that don't support this option should just ignore it. Read this section of the documentation for more details: https://docs.openstack.org/kolla-ansible/latest/admin/advanced-configuration... Best wishes, Pierre Riteau On Wed, 1 Dec 2021 at 19:20, Braden, Albert <abraden@verisign.com> wrote:
I read this with great interest because we are seeing this issue. Questions:
1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?
Does anyone have a sample set of RMQ config files that they can share?
It looks like my Outlook has ruined the link; reposting: [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
-----Original Message----- From: Arnaud Morin <arnaud.morin@gmail.com> Sent: Monday, November 29, 2021 2:04 PM To: Bogdan Dobrelya <bdobreli@redhat.com> Cc: DHilsbos@performair.com; openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
After a talk on this ml (which starts at [1]), we endup building a documentation with Large Scale group. The doc is accessible at [2].
Hope this will help.
[1] http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGF... [2] https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YP...
On 24.11.21 - 11:31, Bogdan Dobrelya wrote:
On 11/24/21 12:34 AM, DHilsbos@performair.com wrote:
All;
In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided. I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).
There is no special recommendations for rabbitmq setup for openstack, but probably a few, like instead of putting it behind a haproxy, or the like, list the rabbit cluster nodes in the oslo messaging config settings directly. Also, it seems that durable queues makes a very little sense for highly ephemeral RPC calls, just by design. I would also add that the raft quorum queues feature of rabbitmq >=3.18 does neither fit well into the oslo messaging design for RPC calls.
A discussable and highly opinionated thing is also configuring ha/mirror queue policy params for queues used for RPC calls vs broad-casted notifications.
And my biased personal humble recommendation is: use the upstream OCF RA [0][1], if configuring rabbitmq cluster by pacemaker.
[0] https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6UR...
[1] https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn...
Unfortunately, I can't find the previous threads on this topic.
Does anyone have this information, that they would care to share with me?
Thank you,
Dominic L. Hilsbos, MBA Vice President - Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Wed, 1 Dec 2021 at 21:27, Pierre Riteau <pierre@stackhpc.com> wrote:
Hello Albert,
The amqp_durable_queues configuration option comes from oslo.messaging, which is a library used by many OpenStack projects. You can set this option in the [oslo_messaging_rabbit] section of each of these OpenStack projects (check their configuration reference for more details).
As for how to configure it with Kolla Ansible: you can either set it directly for each service (e.g. in /etc/kolla/config/nova.conf to configure it for Nova) or for all projects at once using /etc/kolla/config/global.conf. Projects that don't support this option should just ignore it.
Read this section of the documentation for more details: https://docs.openstack.org/kolla-ansible/latest/admin/advanced-configuration...
Best wishes, Pierre Riteau
Should we (Kolla) consider adding more of these to our defaults? Or documenting them in our RabbitMQ guide? Mark
On Wed, 1 Dec 2021 at 19:20, Braden, Albert <abraden@verisign.com> wrote:
I read this with great interest because we are seeing this issue. Questions:
1. We are running kola-ansible Train, and our RMQ version is 3.7.23. Should we be upgrading our Train clusters to use 3.8.x? 2. Document [2] recommends policy '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'. I don't see this in our ansible playbooks, nor in any of the config files in the RMQ container. What would this look like in Ansible, and what should the resulting container config look like? 3. It appears that we are not setting "amqp_durable_queues = True". What does this setting look like in Ansible, and what file does it go into?
Does anyone have a sample set of RMQ config files that they can share?
It looks like my Outlook has ruined the link; reposting: [2] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
-----Original Message----- From: Arnaud Morin <arnaud.morin@gmail.com> Sent: Monday, November 29, 2021 2:04 PM To: Bogdan Dobrelya <bdobreli@redhat.com> Cc: DHilsbos@performair.com; openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [ops]RabbitMQ High Availability
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
After a talk on this ml (which starts at [1]), we endup building a documentation with Large Scale group. The doc is accessible at [2].
Hope this will help.
[1] http://secure-web.cisco.com/1gFccuTyEVGnFd9aBOZ-RTPG0hbVIPGAbuLBNnoXP4onSZGF... [2] https://secure-web.cisco.com/1OtQ3pcnPPBNwevAFxS8yOS2xFlkHo0tY4SmkFE-wpAU_YP...
On 24.11.21 - 11:31, Bogdan Dobrelya wrote:
On 11/24/21 12:34 AM, DHilsbos@performair.com wrote:
All;
In the time I've been part of this mailing list, the subject of RabbitMQ high availability has come up several times, and each time specific recommendations for both Rabbit and Open Stack are provided. I remember it being an A or B kind of recommendation (i.e. configure Rabbit like A1, and Open Stack like A2, OR configure Rabbit like B1, and Open Stack like B2).
There is no special recommendations for rabbitmq setup for openstack, but probably a few, like instead of putting it behind a haproxy, or the like, list the rabbit cluster nodes in the oslo messaging config settings directly. Also, it seems that durable queues makes a very little sense for highly ephemeral RPC calls, just by design. I would also add that the raft quorum queues feature of rabbitmq >=3.18 does neither fit well into the oslo messaging design for RPC calls.
A discussable and highly opinionated thing is also configuring ha/mirror queue policy params for queues used for RPC calls vs broad-casted notifications.
And my biased personal humble recommendation is: use the upstream OCF RA [0][1], if configuring rabbitmq cluster by pacemaker.
[0] https://secure-web.cisco.com/1N1wD9gW7NZho0LdTVNuiU2ZIB7NW-eJMfDgVzBH3D3E6UR...
[1] https://secure-web.cisco.com/1iDK1NnL9JTkQqkpBda06xTQNrWY2W0pVOTDwUoadfQbSXn...
Unfortunately, I can't find the previous threads on this topic.
Does anyone have this information, that they would care to share with me?
Thank you,
Dominic L. Hilsbos, MBA Vice President - Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Mo, 2021-12-06 at 08:40 +0000, Mark Goddard wrote:
Should we (Kolla) consider adding more of these to our defaults? Or documenting them in our RabbitMQ guide?
definitely +1 on both, this stuff took us a while to figure out! -- Mit freundlichen Grüßen / Regards Sven Kieske Systementwickler / systems engineer Mittwald CM Service GmbH & Co. KG Königsberger Straße 4-6 32339 Espelkamp Tel.: 05772 / 293-900 Fax: 05772 / 293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer, Florian Jürgens St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.
participants (5)
-
Arnaud
-
Braden, Albert
-
Mark Goddard
-
Pierre Riteau
-
Sven Kieske