New subject: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

12 Apr 2023

      Hi all,

It seems worth noting here that there is a fix ongoing in
oslo.messaging which will resolve the issues with HA failing when one
node is down. See here:
https://review.opendev.org/c/openstack/oslo.messaging/+/866617
In the meantime, we have also found that setting kombu_reconnect_delay
= 0.5 does resolve this issue.

As for why om_enable_rabbitmq_high_availability is currently
defaulting to false, as Michal said enabling it in stable releases
will impact users. This is because it enables durable queues, and the
migration from transient to durable queues is not a seamless
procedure. It requires that the state of RabbitMQ is reset and that
the OpenStack services which use RabbitMQ are restarted to recreate
the queues.

I think that there is some merit in changing this default value. But
if we did this, we should either add additional support to automate
the migration from transient to durable queues, or at the very least
provide some decent docs on the manual procedure.

However, as classic queue mirroring is deprecated in RabbitMQ (to be
removed in RabbitMQ 4.0) we should maybe consider switching to quorum
queues soon. Then it may be beneficial to leave the classic queue
mirroring + durable queues setup as False by default. This is because
the migration between queue types (durable or quorum) can take several
hours on larger deployments. So it might be worth making sure the
default values only require one migration to quorum queues in the
future, rather than two (durable queues now and then quorum queues in
the future).

We will need to make this switch eventually, but right now RabbitMQ
4.0 does not even have a set release date, so it's not the most urgent
change.

Cheers,
Matt
...
Hi Michal,
Feel free to propose change of default in master branch, but I don?t think we can change the default in stable branches without impacting users.
Best regards,
Michal
...
On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet@ultimum.io> wrote:
Hi,
Btw, why we have such option set to false ?
Michal Arbet
Openstack Engineer
Ultimum Technologies a.s.
Na Po???? 1047/26, 11000 Praha 1
Czech Republic
+420 604 228 897 <>
michal.arbet@ultimum.io <mailto:michal.arbet@ultimum.io>
https://ultimum.io <https://ultimum.io/>
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>> napsal:
...
Hello,
RabbitMQ HA has been backported into stable releases, and it?s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards,
Michal
W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> napisa?(a):
...
Yes.
But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com <mailto:Danny.Webb@thehutgroup.com>> wrote:
...
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a...
From: Satish Patel <satish.txt@gmail.com <mailto:satish.txt@gmail.com>>
Sent: 09 April 2023 04:16
To: Nguy?n H?u Kh?i <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>>
Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org>>
Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
CAUTION: This email originates from outside THG
Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano:
kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache]
backend = oslo_cache.memcache_pool
enabled = True
memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}
memcache_dead_retry = 0.25
memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[
{"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
{"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}
{% endif %}
]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit]
kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149
https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
--
Micha? Nasiadka
mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>

Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Matt Crees

Satish Patel

Nguyễn Hữu Khôi

Matt Crees

Satish Patel

Matt Crees

Satish Patel

Matt Crees

Radosław Piliszek

Satish Patel

Nguyễn Hữu Khôi

Nguyễn Hữu Khôi

Matt Crees

Sean Mooney

Satish Patel

Doug Szumski

Nguyễn Hữu Khôi

Satish Patel

Satish Patel

tags

participants (6)