On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
Hi,
Btw, why we have such option set to false ? it has a pretty big performance penalty if combined with durable queue and in generall its questionable if it should be used going forward.
there is an argument to be made that ha/mirrored queues and durable queue shoudl be replaced with https://www.rabbitmq.com/quorum-queues.html the other thing to consider is that this needs to eb set per vhost so if two serivce share a vhost it needs to be set to the same value. in general for notificaiton both ha and durable queues shoudl be disabled as notifcation are intented to be fire and forget for rpc calls or casts having relibale deliver is imporant but how you acive that is archittur dependent. meaning using ha queue is not alwasy the corrct default. if you need to scale to many request per secodn you are better off using durable queuse with storge on something like ceph/nfs and an active/backup deployment with one rabbit per openstack service. you might choose to run such a rabbit cluster in a k8s env for example using persitent voluems. in other cases simple ha queues and a shred rabbit is fine for small scale deployments. quorum queue may also make more sense. this is why rabbit is called out in the production arctechture guide https://docs.openstack.org/kolla-ansible/latest/admin/production-architectur... and why there is an option to opt into ha/durable queues since that is often enough for small scale deployments. https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rab...
Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com> napsal:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com