[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Satish Patel satish.txt at gmail.com
Wed Apr 12 15:04:40 UTC 2023


Matt,

For new deployment how do I enable the Quorum queue?

Just adding the following should be enough?

om_enable_rabbitmq_high_availability: True

On Wed, Apr 12, 2023 at 10:54 AM Matt Crees <mattc at stackhpc.com> wrote:

> Yes, and the option also needs to be under the oslo_messaging_rabbit
> heading:
>
> [oslo_messaging_rabbit]
> kombu_reconnect_delay=0.5
>
>
> On Wed, 12 Apr 2023 at 15:45, Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
> wrote:
> >
> > Hi.
> > Create global.conf in /etc/kolla/config/
> >
> > On Wed, Apr 12, 2023, 9:42 PM Satish Patel <satish.txt at gmail.com> wrote:
> >>
> >> Hi Matt,
> >>
> >> How do I set kombu_reconnect_delay=0.5 option?
> >>
> >> Something like the following in global.yml?
> >>
> >> kombu_reconnect_delay: 0.5
> >>
> >> On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <mattc at stackhpc.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> It seems worth noting here that there is a fix ongoing in
> >>> oslo.messaging which will resolve the issues with HA failing when one
> >>> node is down. See here:
> >>> https://review.opendev.org/c/openstack/oslo.messaging/+/866617
> >>> In the meantime, we have also found that setting kombu_reconnect_delay
> >>> = 0.5 does resolve this issue.
> >>>
> >>> As for why om_enable_rabbitmq_high_availability is currently
> >>> defaulting to false, as Michal said enabling it in stable releases
> >>> will impact users. This is because it enables durable queues, and the
> >>> migration from transient to durable queues is not a seamless
> >>> procedure. It requires that the state of RabbitMQ is reset and that
> >>> the OpenStack services which use RabbitMQ are restarted to recreate
> >>> the queues.
> >>>
> >>> I think that there is some merit in changing this default value. But
> >>> if we did this, we should either add additional support to automate
> >>> the migration from transient to durable queues, or at the very least
> >>> provide some decent docs on the manual procedure.
> >>>
> >>> However, as classic queue mirroring is deprecated in RabbitMQ (to be
> >>> removed in RabbitMQ 4.0) we should maybe consider switching to quorum
> >>> queues soon. Then it may be beneficial to leave the classic queue
> >>> mirroring + durable queues setup as False by default. This is because
> >>> the migration between queue types (durable or quorum) can take several
> >>> hours on larger deployments. So it might be worth making sure the
> >>> default values only require one migration to quorum queues in the
> >>> future, rather than two (durable queues now and then quorum queues in
> >>> the future).
> >>>
> >>> We will need to make this switch eventually, but right now RabbitMQ
> >>> 4.0 does not even have a set release date, so it's not the most urgent
> >>> change.
> >>>
> >>> Cheers,
> >>> Matt
> >>>
> >>> >Hi Michal,
> >>> >
> >>> >Feel free to propose change of default in master branch, but I don?t
> think we can change the default in stable branches without impacting users.
> >>> >
> >>> >Best regards,
> >>> >Michal
> >>> >
> >>> >> On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet at ultimum.io>
> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> Btw, why we have such option set to false ?
> >>> >> Michal Arbet
> >>> >> Openstack Engineer
> >>> >>
> >>> >> Ultimum Technologies a.s.
> >>> >> Na Po???? 1047/26, 11000 Praha 1
> >>> >> Czech Republic
> >>> >>
> >>> >> +420 604 228 897 <>
> >>> >> michal.arbet at ultimum.io <mailto:michal.arbet at ultimum.io>
> >>> >> https://ultimum.io <https://ultimum.io/>
> >>> >>
> >>> >> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> |
> Twitter <https://twitter.com/ultimumtech> | Facebook <
> https://www.facebook.com/ultimumtechnologies/timeline>
> >>> >>
> >>> >>
> >>> >> ?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <
> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> napsal:
> >>> >>> Hello,
> >>> >>>
> >>> >>> RabbitMQ HA has been backported into stable releases, and it?s
> documented here:
> >>> >>>
> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
> >>> >>>
> >>> >>> Best regards,
> >>> >>> Michal
> >>> >>>
> >>> >>> W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> napisa?(a):
> >>> >>>> Yes.
> >>> >>>> But cluster cannot work properly without it. :(
> >>> >>>>
> >>> >>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <
> Danny.Webb at thehutgroup.com <mailto:Danny.Webb at thehutgroup.com>> wrote:
> >>> >>>>> This commit explains why they largely removed HA queue
> durability:
> >>> >>>>>
> >>> >>>>>
> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
> >>> >>>>> From: Satish Patel <satish.txt at gmail.com <mailto:
> satish.txt at gmail.com>>
> >>> >>>>> Sent: 09 April 2023 04:16
> >>> >>>>> To: Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com <mailto:
> nguyenhuukhoinw at gmail.com>>
> >>> >>>>> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org
> <mailto:openstack-discuss at lists.openstack.org>>
> >>> >>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1
> of 3 controller was be down
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> CAUTION: This email originates from outside THG
> >>> >>>>>
> >>> >>>>> Are you proposing a solution or just raising an issue?
> >>> >>>>>
> >>> >>>>> I did find it strange that kolla-ansible doesn't support HA
> queue by default. That is a disaster because when one of the nodes goes
> down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla
> i have to add HA policy to make queue HA otherwise you will endup in
> problem.
> >>> >>>>>
> >>> >>>>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> wrote:
> >>> >>>>> Hello everyone.
> >>> >>>>>
> >>> >>>>> I want to summary for who meets problems with Openstack when
> deploy cluster with 3 controller using Kolla Ansible
> >>> >>>>>
> >>> >>>>> Scenario: 1 of 3 controller is down
> >>> >>>>>
> >>> >>>>> 1. Login horizon and use API such as nova, cinder will be very
> slow
> >>> >>>>>
> >>> >>>>> fix by:
> >>> >>>>>
> >>> >>>>> nano:
> >>> >>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
> >>> >>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
> >>> >>>>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
> >>> >>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
> >>> >>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
> >>> >>>>>
> >>> >>>>> or which service need caches
> >>> >>>>>
> >>> >>>>> add as below
> >>> >>>>>
> >>> >>>>> [cache]
> >>> >>>>> backend = oslo_cache.memcache_pool
> >>> >>>>> enabled = True
> >>> >>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{
> memcached_port }}
> >>> >>>>> memcache_dead_retry = 0.25
> >>> >>>>> memcache_socket_timeout = 900
> >>> >>>>>
> >>> >>>>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
> >>> >>>>>
> >>> >>>>> but it is not the end
> >>> >>>>>
> >>> >>>>> 2. Cannot launch instance or mapping block device(stuck at this
> step)
> >>> >>>>>
> >>> >>>>> nano
> kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
> >>> >>>>>
> >>> >>>>> "policies":[
> >>> >>>>> {"vhost": "/", "name": "ha-all", "pattern":
> "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
> {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
> >>> >>>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
> "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
> "priority":0}
> >>> >>>>> {% endif %}
> >>> >>>>> ]
> >>> >>>>>
> >>> >>>>> nano /etc/kollla/global.conf
> >>> >>>>>
> >>> >>>>> [oslo_messaging_rabbit]
> >>> >>>>> kombu_reconnect_delay=0.5
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
> >>> >>>>>
> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
> >>> >>>>>
> >>> >>>>> I used Xena 13.4 and Yoga 14.8.1.
> >>> >>>>>
> >>> >>>>> Above bugs are critical, but I see that it was not fixed. I am
> just an operator and I want to share what I encountered for new people who
> come to Openstack
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Nguyen Huu Khoi
> >>> >>> --
> >>> >>> Micha? Nasiadka
> >>> >>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230412/29620ebe/attachment-0001.htm>


More information about the openstack-discuss mailing list