Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

12 Apr 2023


      Hi Nguyễn,

Oh!! make sense, In your post you did the following that is why i got
confused :)  Let me try it in /etc/kolla/config/global.conf file and run
deploy.
...
...
...
...
...
nano /etc/kollla/global.conf
[oslo_messaging_rabbit]
kombu_reconnect_delay=0.5
On Wed, Apr 12, 2023 at 10:45 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>
wrote:
...
Hi.
Create global.conf in /etc/kolla/config/
On Wed, Apr 12, 2023, 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
...
Hi Matt,
How do I set kombu_reconnect_delay=0.5 option?
Something like the following in global.yml?
kombu_reconnect_delay: 0.5
On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <mattc@stackhpc.com> wrote:
...
Hi all,
It seems worth noting here that there is a fix ongoing in
oslo.messaging which will resolve the issues with HA failing when one
node is down. See here:
https://review.opendev.org/c/openstack/oslo.messaging/+/866617
In the meantime, we have also found that setting kombu_reconnect_delay
= 0.5 does resolve this issue.
As for why om_enable_rabbitmq_high_availability is currently
defaulting to false, as Michal said enabling it in stable releases
will impact users. This is because it enables durable queues, and the
migration from transient to durable queues is not a seamless
procedure. It requires that the state of RabbitMQ is reset and that
the OpenStack services which use RabbitMQ are restarted to recreate
the queues.
I think that there is some merit in changing this default value. But
if we did this, we should either add additional support to automate
the migration from transient to durable queues, or at the very least
provide some decent docs on the manual procedure.
However, as classic queue mirroring is deprecated in RabbitMQ (to be
removed in RabbitMQ 4.0) we should maybe consider switching to quorum
queues soon. Then it may be beneficial to leave the classic queue
mirroring + durable queues setup as False by default. This is because
the migration between queue types (durable or quorum) can take several
hours on larger deployments. So it might be worth making sure the
default values only require one migration to quorum queues in the
future, rather than two (durable queues now and then quorum queues in
the future).
We will need to make this switch eventually, but right now RabbitMQ
4.0 does not even have a set release date, so it's not the most urgent
change.
Cheers,
Matt
...
Hi Michal,
Feel free to propose change of default in master branch, but I don?t
think we can change the default in stable branches without impacting users.
Best regards,
Michal
...
On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet@ultimum.io>
wrote:
Hi,
Btw, why we have such option set to false ?
Michal Arbet
Openstack Engineer
Ultimum Technologies a.s.
Na Po???? 1047/26, 11000 Praha 1
Czech Republic
+420 604 228 897 <>
michal.arbet@ultimum.io <mailto:michal.arbet@ultimum.io>
https://ultimum.io <https://ultimum.io/>
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> |
Twitter <https://twitter.com/ultimumtech> | Facebook <
https://www.facebook.com/ultimumtechnologies/timeline>
?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <
mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>> napsal:
...
Hello,
RabbitMQ HA has been backported into stable releases, and it?s
documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
...
Best regards,
Michal
W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <
nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>>
napisa?(a):
...
> Yes.
> But cluster cannot work properly without it. :(
>
> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <
Danny.Webb@thehutgroup.com <mailto:Danny.Webb@thehutgroup.com>> wrote:
>> This commit explains why they largely removed HA queue durability:
>>
>>
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a...
>> From: Satish Patel <satish.txt@gmail.com <mailto:
satish.txt@gmail.com>>
>> Sent: 09 April 2023 04:16
>> To: Nguy?n H?u Kh?i <nguyenhuukhoinw@gmail.com <mailto:
nguyenhuukhoinw@gmail.com>>
>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org
<mailto:openstack-discuss@lists.openstack.org>>
>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of
3 controller was be down
>>
>>
>> CAUTION: This email originates from outside THG
>>
>> Are you proposing a solution or just raising an issue?
>>
>> I did find it strange that kolla-ansible doesn't support HA queue
by default. That is a disaster because when one of the nodes goes down it
will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have
to add HA policy to make queue HA otherwise you will endup in problem.
>>
>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <
nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> wrote:
>> Hello everyone.
>>
>> I want to summary for who meets problems with Openstack when
deploy cluster with 3 controller using Kolla Ansible
>>
>> Scenario: 1 of 3 controller is down
>>
>> 1. Login horizon and use API such as nova, cinder will be very slow
>>
>> fix by:
>>
>> nano:
>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
>>
>> or which service need caches
>>
>> add as below
>>
>> [cache]
>> backend = oslo_cache.memcache_pool
>> enabled = True
>> memcache_servers = {{ kolla_internal_vip_address }}:{{
memcached_port }}
>> memcache_dead_retry = 0.25
>> memcache_socket_timeout = 900
>>
>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
>>
>> but it is not the end
>>
>> 2. Cannot launch instance or mapping block device(stuck at this
step)
>>
>> nano
kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
>>
>> "policies":[
>> {"vhost": "/", "name": "ha-all", "pattern":
"^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
{"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
"pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
"priority":0}
>> {% endif %}
>> ]
>>
>> nano /etc/kollla/global.conf
>>
>> [oslo_messaging_rabbit]
>> kombu_reconnect_delay=0.5
>>
>>
>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
>>
https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
>>
>> I used Xena 13.4 and Yoga 14.8.1.
>>
>> Above bugs are critical, but I see that it was not fixed. I am
just an operator and I want to share what I encountered for new people who
come to Openstack
>>
>>
>> Nguyen Huu Khoi
--
Micha? Nasiadka
mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>