Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

11 Apr 2023

      On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
...
Hi,
Btw, why we have such option set to false ?
it has a pretty big performance penalty if combined with durable queue
and in generall its questionable if it should be used going forward.
there is an argument to be made that ha/mirrored queues and durable queue shoudl be replaced
with https://www.rabbitmq.com/quorum-queues.html

the other thing to consider is that this needs to eb set per vhost
so if two serivce share a vhost it needs to be set to the same value.

in general for notificaiton both ha and durable queues shoudl be disabled as notifcation are intented
to be fire and forget
for rpc calls or casts having relibale deliver is imporant but how you acive that is archittur dependent.
meaning using ha queue is not alwasy the corrct default.
if you need to scale to many request per secodn you are better off using durable queuse with storge on something like
ceph/nfs and an active/backup deployment with one rabbit per openstack service. you might choose to run such a rabbit
cluster in a k8s env for example using persitent voluems.

in other cases simple ha queues and a shred rabbit is fine for small scale deployments.
quorum queue may also make more sense.

this is why rabbit is called out in the production arctechture guide
https://docs.openstack.org/kolla-ansible/latest/admin/production-architectur...
and why there is an option to opt into ha/durable queues since that is often enough for small scale deployments.
https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rab...
...
Michal Arbet
Openstack Engineer
Ultimum Technologies a.s.
Na Poříčí 1047/26, 11000 Praha 1
Czech Republic
+420 604 228 897
michal.arbet@ultimum.io
*https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
<https://twitter.com/ultimumtech> | Facebook
<https://www.facebook.com/ultimumtechnologies/timeline>
út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com>
napsal:
...
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented
here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards,
Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>
napisał(a):
...
Yes.
But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com>
wrote:
...
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a...
------------------------------
*From:* Satish Patel <satish.txt@gmail.com>
*Sent:* 09 April 2023 04:16
*To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>
*Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org>
*Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3
controller was be down
* CAUTION: This email originates from outside THG *
------------------------------
Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by
default. That is a disaster because when one of the nodes goes down it will
make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add
HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <
nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy
cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano:
kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache]
backend = oslo_cache.memcache_pool
enabled = True
memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}
memcache_dead_retry = 0.25
memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[
    {"vhost": "/", "name": "ha-all", "pattern":
"^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
{"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
    {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
"pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
"priority":0}
    {% endif %}
  ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit]
kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149
https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an
operator and I want to share what I encountered for new people who come to
Openstack
Nguyen Huu Khoi
--
Michał Nasiadka
mnasiadka@gmail.com

Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Sean Mooney