[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Satish Patel satish.txt at gmail.com
Wed Apr 12 15:55:18 UTC 2023


Matt,

After enabling om_enable_rabbitmq_high_availability: True  and
kombu_reconnect_delay=0.5  all my api services started throwing the
following logs. Even i rebuild my RabbitMQ cluster again. What could be
wrong here?

2023-04-12 15:53:40.380 391 ERROR oslo_service.service
amqp.exceptions.PreconditionFailed: Exchange.declare: (406)
PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'neutron' in
vhost '/': received 'true' but current is 'false'
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
2023-04-12 15:53:40.380 391 ERROR oslo_service.service During handling of
the above exception, another exception occurred:
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
2023-04-12 15:53:40.380 391 ERROR oslo_service.service Traceback (most
recent call last):
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_service/service.py",
line 806, in run_service
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     service.start()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/neutron/service.py", line
115, in start
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     servers =
getattr(plugin, self.start_listeners_method)()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_log/helpers.py",
line 67, in wrapper
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     return
method(*args, **kwargs)
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/plugin.py",
line 425, in start_rpc_listeners
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     return
self.conn.consume_in_threads()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/neutron_lib/rpc.py", line
351, in consume_in_threads
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     server.start()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/server.py",
line 267, in wrapper
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
states[state].run_once(lambda: fn(self, *args, **kwargs),
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/server.py",
line 188, in run_once
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     post_fn = fn()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/server.py",
line 267, in <lambda>
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
states[state].run_once(lambda: fn(self, *args, **kwargs),
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/server.py",
line 413, in start
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     self.listener =
self._create_listener()
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/rpc/server.py",
line 150, in _create_listener
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     return
self.transport._listen(self._target, 1, None)
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/transport.py",
line 142, in _listen
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     return
self._driver.listen(target, batch_size,
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/_drivers/amqpdriver.py",
line 702, in listen
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
conn.declare_topic_consumer(exchange_name=self._get_exchange(target),
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/_drivers/impl_rabbit.py",
line 1295, in declare_topic_consumer
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
self.declare_consumer(consumer)
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/_drivers/impl_rabbit.py",
line 1192, in declare_consumer
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     return
self.ensure(_declare_consumer,
2023-04-12 15:53:40.380 391 ERROR oslo_service.service   File
"/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_messaging/_drivers/impl_rabbit.py",
line 977, in ensure
2023-04-12 15:53:40.380 391 ERROR oslo_service.service     raise
exceptions.MessageDeliveryFailure(msg)
2023-04-12 15:53:40.380 391 ERROR oslo_service.service
oslo_messaging.exceptions.MessageDeliveryFailure: Unable to connect to AMQP
server on 10.30.50.3:5672 after inf tries: Exchange.declare: (406)
PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'neutron' in
vhost '/': received 'true' but current is 'false'
2023-04-12 15:53:40.380 391 ERROR oslo_service.service

On Wed, Apr 12, 2023 at 11:10 AM Matt Crees <mattc at stackhpc.com> wrote:

> Hi Satish,
>
> Yes for a new deployment you will just need to set that variable to
> true. However, that will enable the high availability of RabbitMQ
> queues using a combination of classic queue mirroring and durable
> queues.
> Quorum queues are not yet supported via Kolla Ansible.
>
> Cheers,
> Matt
>
> On Wed, 12 Apr 2023 at 16:04, Satish Patel <satish.txt at gmail.com> wrote:
> >
> > Matt,
> >
> > For new deployment how do I enable the Quorum queue?
> >
> > Just adding the following should be enough?
> >
> > om_enable_rabbitmq_high_availability: True
> >
> > On Wed, Apr 12, 2023 at 10:54 AM Matt Crees <mattc at stackhpc.com> wrote:
> >>
> >> Yes, and the option also needs to be under the oslo_messaging_rabbit
> heading:
> >>
> >> [oslo_messaging_rabbit]
> >> kombu_reconnect_delay=0.5
> >>
> >>
> >> On Wed, 12 Apr 2023 at 15:45, Nguyễn Hữu Khôi <
> nguyenhuukhoinw at gmail.com> wrote:
> >> >
> >> > Hi.
> >> > Create global.conf in /etc/kolla/config/
> >> >
> >> > On Wed, Apr 12, 2023, 9:42 PM Satish Patel <satish.txt at gmail.com>
> wrote:
> >> >>
> >> >> Hi Matt,
> >> >>
> >> >> How do I set kombu_reconnect_delay=0.5 option?
> >> >>
> >> >> Something like the following in global.yml?
> >> >>
> >> >> kombu_reconnect_delay: 0.5
> >> >>
> >> >> On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <mattc at stackhpc.com>
> wrote:
> >> >>>
> >> >>> Hi all,
> >> >>>
> >> >>> It seems worth noting here that there is a fix ongoing in
> >> >>> oslo.messaging which will resolve the issues with HA failing when
> one
> >> >>> node is down. See here:
> >> >>> https://review.opendev.org/c/openstack/oslo.messaging/+/866617
> >> >>> In the meantime, we have also found that setting
> kombu_reconnect_delay
> >> >>> = 0.5 does resolve this issue.
> >> >>>
> >> >>> As for why om_enable_rabbitmq_high_availability is currently
> >> >>> defaulting to false, as Michal said enabling it in stable releases
> >> >>> will impact users. This is because it enables durable queues, and
> the
> >> >>> migration from transient to durable queues is not a seamless
> >> >>> procedure. It requires that the state of RabbitMQ is reset and that
> >> >>> the OpenStack services which use RabbitMQ are restarted to recreate
> >> >>> the queues.
> >> >>>
> >> >>> I think that there is some merit in changing this default value. But
> >> >>> if we did this, we should either add additional support to automate
> >> >>> the migration from transient to durable queues, or at the very least
> >> >>> provide some decent docs on the manual procedure.
> >> >>>
> >> >>> However, as classic queue mirroring is deprecated in RabbitMQ (to be
> >> >>> removed in RabbitMQ 4.0) we should maybe consider switching to
> quorum
> >> >>> queues soon. Then it may be beneficial to leave the classic queue
> >> >>> mirroring + durable queues setup as False by default. This is
> because
> >> >>> the migration between queue types (durable or quorum) can take
> several
> >> >>> hours on larger deployments. So it might be worth making sure the
> >> >>> default values only require one migration to quorum queues in the
> >> >>> future, rather than two (durable queues now and then quorum queues
> in
> >> >>> the future).
> >> >>>
> >> >>> We will need to make this switch eventually, but right now RabbitMQ
> >> >>> 4.0 does not even have a set release date, so it's not the most
> urgent
> >> >>> change.
> >> >>>
> >> >>> Cheers,
> >> >>> Matt
> >> >>>
> >> >>> >Hi Michal,
> >> >>> >
> >> >>> >Feel free to propose change of default in master branch, but I
> don?t think we can change the default in stable branches without impacting
> users.
> >> >>> >
> >> >>> >Best regards,
> >> >>> >Michal
> >> >>> >
> >> >>> >> On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet at ultimum.io>
> wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> Btw, why we have such option set to false ?
> >> >>> >> Michal Arbet
> >> >>> >> Openstack Engineer
> >> >>> >>
> >> >>> >> Ultimum Technologies a.s.
> >> >>> >> Na Po???? 1047/26, 11000 Praha 1
> >> >>> >> Czech Republic
> >> >>> >>
> >> >>> >> +420 604 228 897 <>
> >> >>> >> michal.arbet at ultimum.io <mailto:michal.arbet at ultimum.io>
> >> >>> >> https://ultimum.io <https://ultimum.io/>
> >> >>> >>
> >> >>> >> LinkedIn <https://www.linkedin.com/company/ultimum-technologies>
> | Twitter <https://twitter.com/ultimumtech> | Facebook <
> https://www.facebook.com/ultimumtechnologies/timeline>
> >> >>> >>
> >> >>> >>
> >> >>> >> ?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <
> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> napsal:
> >> >>> >>> Hello,
> >> >>> >>>
> >> >>> >>> RabbitMQ HA has been backported into stable releases, and it?s
> documented here:
> >> >>> >>>
> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
> >> >>> >>>
> >> >>> >>> Best regards,
> >> >>> >>> Michal
> >> >>> >>>
> >> >>> >>> W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> napisa?(a):
> >> >>> >>>> Yes.
> >> >>> >>>> But cluster cannot work properly without it. :(
> >> >>> >>>>
> >> >>> >>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <
> Danny.Webb at thehutgroup.com <mailto:Danny.Webb at thehutgroup.com>> wrote:
> >> >>> >>>>> This commit explains why they largely removed HA queue
> durability:
> >> >>> >>>>>
> >> >>> >>>>>
> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
> >> >>> >>>>> From: Satish Patel <satish.txt at gmail.com <mailto:
> satish.txt at gmail.com>>
> >> >>> >>>>> Sent: 09 April 2023 04:16
> >> >>> >>>>> To: Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com <mailto:
> nguyenhuukhoinw at gmail.com>>
> >> >>> >>>>> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org
> <mailto:openstack-discuss at lists.openstack.org>>
> >> >>> >>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when
> 1 of 3 controller was be down
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> CAUTION: This email originates from outside THG
> >> >>> >>>>>
> >> >>> >>>>> Are you proposing a solution or just raising an issue?
> >> >>> >>>>>
> >> >>> >>>>> I did find it strange that kolla-ansible doesn't support HA
> queue by default. That is a disaster because when one of the nodes goes
> down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla
> i have to add HA policy to make queue HA otherwise you will endup in
> problem.
> >> >>> >>>>>
> >> >>> >>>>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> wrote:
> >> >>> >>>>> Hello everyone.
> >> >>> >>>>>
> >> >>> >>>>> I want to summary for who meets problems with Openstack when
> deploy cluster with 3 controller using Kolla Ansible
> >> >>> >>>>>
> >> >>> >>>>> Scenario: 1 of 3 controller is down
> >> >>> >>>>>
> >> >>> >>>>> 1. Login horizon and use API such as nova, cinder will be
> very slow
> >> >>> >>>>>
> >> >>> >>>>> fix by:
> >> >>> >>>>>
> >> >>> >>>>> nano:
> >> >>> >>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
> >> >>> >>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
> >> >>> >>>>>
> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
> >> >>> >>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
> >> >>> >>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
> >> >>> >>>>>
> >> >>> >>>>> or which service need caches
> >> >>> >>>>>
> >> >>> >>>>> add as below
> >> >>> >>>>>
> >> >>> >>>>> [cache]
> >> >>> >>>>> backend = oslo_cache.memcache_pool
> >> >>> >>>>> enabled = True
> >> >>> >>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{
> memcached_port }}
> >> >>> >>>>> memcache_dead_retry = 0.25
> >> >>> >>>>> memcache_socket_timeout = 900
> >> >>> >>>>>
> >> >>> >>>>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
> >> >>> >>>>>
> >> >>> >>>>> but it is not the end
> >> >>> >>>>>
> >> >>> >>>>> 2. Cannot launch instance or mapping block device(stuck at
> this step)
> >> >>> >>>>>
> >> >>> >>>>> nano
> kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
> >> >>> >>>>>
> >> >>> >>>>> "policies":[
> >> >>> >>>>> {"vhost": "/", "name": "ha-all", "pattern":
> "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
> {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
> >> >>> >>>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name":
> "ha-all", "pattern": ".*", "apply-to": "all", "definition":
> {"ha-mode":"all"}, "priority":0}
> >> >>> >>>>> {% endif %}
> >> >>> >>>>> ]
> >> >>> >>>>>
> >> >>> >>>>> nano /etc/kollla/global.conf
> >> >>> >>>>>
> >> >>> >>>>> [oslo_messaging_rabbit]
> >> >>> >>>>> kombu_reconnect_delay=0.5
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
> >> >>> >>>>>
> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
> >> >>> >>>>>
> >> >>> >>>>> I used Xena 13.4 and Yoga 14.8.1.
> >> >>> >>>>>
> >> >>> >>>>> Above bugs are critical, but I see that it was not fixed. I
> am just an operator and I want to share what I encountered for new people
> who come to Openstack
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> Nguyen Huu Khoi
> >> >>> >>> --
> >> >>> >>> Micha? Nasiadka
> >> >>> >>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
> >> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230412/e1131f56/attachment-0001.htm>


More information about the openstack-discuss mailing list