[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Sean Mooney smooney at redhat.com
Tue Apr 11 14:03:55 UTC 2023


On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
> Hi,
> 
> Btw, why we have such option set to false ?
it has a pretty big performance penalty if combined with durable queue
and in generall its questionable if it should be used going forward.

there is an argument to be made that ha/mirrored queues and durable queue shoudl be replaced
with https://www.rabbitmq.com/quorum-queues.html


the other thing to consider is that this needs to eb set per vhost
so if two serivce share a vhost it needs to be set to the same value.

in general for notificaiton both ha and durable queues shoudl be disabled as notifcation are intented
to be fire and forget
for rpc calls or casts having relibale deliver is imporant but how you acive that is archittur dependent.
meaning using ha queue is not alwasy the corrct default.
if you need to scale to many request per secodn you are better off using durable queuse with storge on something like
ceph/nfs and an active/backup deployment with one rabbit per openstack service. you might choose to run such a rabbit
cluster in a k8s env for example using persitent voluems.

in other cases simple ha queues and a shred rabbit is fine for small scale deployments.
quorum queue may also make more sense.

this is why rabbit is called out in the production arctechture guide
https://docs.openstack.org/kolla-ansible/latest/admin/production-architecture-guide.html#other-core-services
and why there is an option to opt into ha/durable queues since that is often enough for small scale deployments.
https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability











> Michal Arbet
> Openstack Engineer
> 
> Ultimum Technologies a.s.
> Na Poříčí 1047/26, 11000 Praha 1
> Czech Republic
> 
> +420 604 228 897
> michal.arbet at ultimum.io
> *https://ultimum.io <https://ultimum.io/>*
> 
> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
> <https://twitter.com/ultimumtech> | Facebook
> <https://www.facebook.com/ultimumtechnologies/timeline>
> 
> 
> út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka at gmail.com>
> napsal:
> 
> > Hello,
> > 
> > RabbitMQ HA has been backported into stable releases, and it’s documented
> > here:
> > 
> > https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
> > 
> > Best regards,
> > Michal
> > 
> > W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
> > napisał(a):
> > 
> > > Yes.
> > > But cluster cannot work properly without it. :(
> > > 
> > > On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb at thehutgroup.com>
> > > wrote:
> > > 
> > > > This commit explains why they largely removed HA queue durability:
> > > > 
> > > > 
> > > > https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
> > > > ------------------------------
> > > > *From:* Satish Patel <satish.txt at gmail.com>
> > > > *Sent:* 09 April 2023 04:16
> > > > *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
> > > > *Cc:* OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > > > *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3
> > > > controller was be down
> > > > 
> > > > 
> > > > * CAUTION: This email originates from outside THG *
> > > > ------------------------------
> > > > Are you proposing a solution or just raising an issue?
> > > > 
> > > > I did find it strange that kolla-ansible doesn't support HA queue by
> > > > default. That is a disaster because when one of the nodes goes down it will
> > > > make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add
> > > > HA policy to make queue HA otherwise you will endup in problem.
> > > > 
> > > > On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <
> > > > nguyenhuukhoinw at gmail.com> wrote:
> > > > 
> > > > Hello everyone.
> > > > 
> > > > I want to summary for who meets problems with Openstack when deploy
> > > > cluster with 3 controller using Kolla Ansible
> > > > 
> > > > Scenario: 1 of 3 controller is down
> > > > 
> > > > 1. Login horizon and use API such as nova, cinder will be very slow
> > > > 
> > > > fix by:
> > > > 
> > > > nano:
> > > > kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
> > > > kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
> > > > kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
> > > > kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
> > > > kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
> > > > 
> > > > or which service need caches
> > > > 
> > > > add as below
> > > > 
> > > > [cache]
> > > > backend = oslo_cache.memcache_pool
> > > > enabled = True
> > > > memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}
> > > > memcache_dead_retry = 0.25
> > > > memcache_socket_timeout = 900
> > > > 
> > > > https://review.opendev.org/c/openstack/kolla-ansible/+/849487
> > > > 
> > > > but it is not the end
> > > > 
> > > > 2. Cannot launch instance or mapping block device(stuck at this step)
> > > > 
> > > > nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
> > > > 
> > > > "policies":[
> > > >     {"vhost": "/", "name": "ha-all", "pattern":
> > > > "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
> > > > {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
> > > >     {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
> > > > "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
> > > > "priority":0}
> > > >     {% endif %}
> > > >   ]
> > > > 
> > > > nano /etc/kollla/global.conf
> > > > 
> > > > [oslo_messaging_rabbit]
> > > > kombu_reconnect_delay=0.5
> > > > 
> > > > 
> > > > https://bugs.launchpad.net/oslo.messaging/+bug/1993149
> > > > https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
> > > > 
> > > > I used Xena 13.4 and Yoga 14.8.1.
> > > > 
> > > > Above bugs are critical, but I see that it was not fixed. I am just an
> > > > operator and I want to share what I encountered for new people who come to
> > > > Openstack
> > > > 
> > > > 
> > > > Nguyen Huu Khoi
> > > > 
> > > > --
> > Michał Nasiadka
> > mnasiadka at gmail.com
> > 




More information about the openstack-discuss mailing list