[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Satish Patel satish.txt at gmail.com
Tue Apr 11 15:06:11 UTC 2023


This is what i am doing in my deployment, not sure if this is right or not
but it works for me and survives my cluster all reboot.

# cat /etc/kolla/config/rabbitmq/definitions.json

{

"vhosts": [{

"name": "/"

}],

"users": [{

"name": "openstack",

"password": "Password123",

"tags": "administrator"

},

{

"name": "monitoring",

"password": "Password321",

"tags": "monitoring"

}

],

"permissions": [{

"user": "openstack",

"vhost": "/",

"configure": ".*",

"write": ".*",

"read": ".*"

},

{

"user": "monitoring",

"vhost": "/",

"configure": "^$",

"write": "^$",

"read": ".*"

}

],

"policies": [{

"vhost": "/",

"name": "ha-all",

"pattern": "^(?!(amq\\.)|(.*_fanout_)|(reply_)).*",

"apply-to": "all",

"definition": {

"ha-mode": "all"

},

"priority": 0

}]

}

On Tue, Apr 11, 2023 at 10:04 AM Sean Mooney <smooney at redhat.com> wrote:

> On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
> > Hi,
> >
> > Btw, why we have such option set to false ?
> it has a pretty big performance penalty if combined with durable queue
> and in generall its questionable if it should be used going forward.
>
> there is an argument to be made that ha/mirrored queues and durable queue
> shoudl be replaced
> with https://www.rabbitmq.com/quorum-queues.html
>
>
> the other thing to consider is that this needs to eb set per vhost
> so if two serivce share a vhost it needs to be set to the same value.
>
> in general for notificaiton both ha and durable queues shoudl be disabled
> as notifcation are intented
> to be fire and forget
> for rpc calls or casts having relibale deliver is imporant but how you
> acive that is archittur dependent.
> meaning using ha queue is not alwasy the corrct default.
> if you need to scale to many request per secodn you are better off using
> durable queuse with storge on something like
> ceph/nfs and an active/backup deployment with one rabbit per openstack
> service. you might choose to run such a rabbit
> cluster in a k8s env for example using persitent voluems.
>
> in other cases simple ha queues and a shred rabbit is fine for small scale
> deployments.
> quorum queue may also make more sense.
>
> this is why rabbit is called out in the production arctechture guide
>
> https://docs.openstack.org/kolla-ansible/latest/admin/production-architecture-guide.html#other-core-services
> and why there is an option to opt into ha/durable queues since that is
> often enough for small scale deployments.
>
> https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability
>
>
>
>
>
>
>
>
>
>
>
> > Michal Arbet
> > Openstack Engineer
> >
> > Ultimum Technologies a.s.
> > Na Poříčí 1047/26, 11000 Praha 1
> > Czech Republic
> >
> > +420 604 228 897
> > michal.arbet at ultimum.io
> > *https://ultimum.io <https://ultimum.io/>*
> >
> > LinkedIn <https://www.linkedin.com/company/ultimum-technologies> |
> Twitter
> > <https://twitter.com/ultimumtech> | Facebook
> > <https://www.facebook.com/ultimumtechnologies/timeline>
> >
> >
> > út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka at gmail.com>
> > napsal:
> >
> > > Hello,
> > >
> > > RabbitMQ HA has been backported into stable releases, and it’s
> documented
> > > here:
> > >
> > >
> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
> > >
> > > Best regards,
> > > Michal
> > >
> > > W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <
> nguyenhuukhoinw at gmail.com>
> > > napisał(a):
> > >
> > > > Yes.
> > > > But cluster cannot work properly without it. :(
> > > >
> > > > On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb at thehutgroup.com
> >
> > > > wrote:
> > > >
> > > > > This commit explains why they largely removed HA queue durability:
> > > > >
> > > > >
> > > > >
> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
> > > > > ------------------------------
> > > > > *From:* Satish Patel <satish.txt at gmail.com>
> > > > > *Sent:* 09 April 2023 04:16
> > > > > *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
> > > > > *Cc:* OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > > > > *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1
> of 3
> > > > > controller was be down
> > > > >
> > > > >
> > > > > * CAUTION: This email originates from outside THG *
> > > > > ------------------------------
> > > > > Are you proposing a solution or just raising an issue?
> > > > >
> > > > > I did find it strange that kolla-ansible doesn't support HA queue
> by
> > > > > default. That is a disaster because when one of the nodes goes
> down it will
> > > > > make the whole rabbitMQ unacceptable. Whenever i deploy kolla i
> have to add
> > > > > HA policy to make queue HA otherwise you will endup in problem.
> > > > >
> > > > > On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <
> > > > > nguyenhuukhoinw at gmail.com> wrote:
> > > > >
> > > > > Hello everyone.
> > > > >
> > > > > I want to summary for who meets problems with Openstack when deploy
> > > > > cluster with 3 controller using Kolla Ansible
> > > > >
> > > > > Scenario: 1 of 3 controller is down
> > > > >
> > > > > 1. Login horizon and use API such as nova, cinder will be very slow
> > > > >
> > > > > fix by:
> > > > >
> > > > > nano:
> > > > > kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
> > > > > kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
> > > > > kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
> > > > > kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
> > > > > kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
> > > > >
> > > > > or which service need caches
> > > > >
> > > > > add as below
> > > > >
> > > > > [cache]
> > > > > backend = oslo_cache.memcache_pool
> > > > > enabled = True
> > > > > memcache_servers = {{ kolla_internal_vip_address }}:{{
> memcached_port }}
> > > > > memcache_dead_retry = 0.25
> > > > > memcache_socket_timeout = 900
> > > > >
> > > > > https://review.opendev.org/c/openstack/kolla-ansible/+/849487
> > > > >
> > > > > but it is not the end
> > > > >
> > > > > 2. Cannot launch instance or mapping block device(stuck at this
> step)
> > > > >
> > > > > nano
> kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
> > > > >
> > > > > "policies":[
> > > > >     {"vhost": "/", "name": "ha-all", "pattern":
> > > > > "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all",
> "definition":
> > > > > {"ha-mode":"all"}, "priority":0}{% if project_name ==
> 'outward_rabbitmq' %},
> > > > >     {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name":
> "ha-all",
> > > > > "pattern": ".*", "apply-to": "all", "definition":
> {"ha-mode":"all"},
> > > > > "priority":0}
> > > > >     {% endif %}
> > > > >   ]
> > > > >
> > > > > nano /etc/kollla/global.conf
> > > > >
> > > > > [oslo_messaging_rabbit]
> > > > > kombu_reconnect_delay=0.5
> > > > >
> > > > >
> > > > > https://bugs.launchpad.net/oslo.messaging/+bug/1993149
> > > > >
> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
> > > > >
> > > > > I used Xena 13.4 and Yoga 14.8.1.
> > > > >
> > > > > Above bugs are critical, but I see that it was not fixed. I am
> just an
> > > > > operator and I want to share what I encountered for new people who
> come to
> > > > > Openstack
> > > > >
> > > > >
> > > > > Nguyen Huu Khoi
> > > > >
> > > > > --
> > > Michał Nasiadka
> > > mnasiadka at gmail.com
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230411/c62995d2/attachment-0001.htm>


More information about the openstack-discuss mailing list