[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Danny Webb Danny.Webb at thehutgroup.com
Tue Apr 11 11:17:57 UTC 2023


This commit explains why they largely removed HA queue durability:

https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
________________________________
From: Satish Patel <satish.txt at gmail.com>
Sent: 09 April 2023 04:16
To: Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down


CAUTION: This email originates from outside THG

________________________________
Are you proposing a solution or just raising an issue?

I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.

On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com<mailto:nguyenhuukhoinw at gmail.com>> wrote:
Hello everyone.

I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible

Scenario: 1 of 3 controller is down

1. Login horizon and use API such as nova, cinder will be very slow

fix by:

nano:
kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2

or which service need caches

add as below

[cache]
backend = oslo_cache.memcache_pool
enabled = True
memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}
memcache_dead_retry = 0.25
memcache_socket_timeout = 900

https://review.opendev.org/c/openstack/kolla-ansible/+/849487<https://review.opendev.org/c/openstack/kolla-ansible/+/849487>

but it is not the end

2. Cannot launch instance or mapping block device(stuck at this step)

nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2

"policies":[
    {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
    {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}
    {% endif %}
  ]

nano /etc/kollla/global.conf

[oslo_messaging_rabbit]
kombu_reconnect_delay=0.5


https://bugs.launchpad.net/oslo.messaging/+bug/1993149<https://bugs.launchpad.net/oslo.messaging/+bug/1993149>
https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html<https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html>

I used Xena 13.4 and Yoga 14.8.1.

Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack


Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230411/03813a16/attachment-0001.htm>


More information about the openstack-discuss mailing list