[kolla-ansible]Reset Configuration

Erik McCormick emccormick at cirrusseven.com
Sat Nov 12 14:10:13 UTC 2022


On Sat, Nov 12, 2022 at 3:08 AM Franck VEDEL <
franck.vedel at univ-grenoble-alpes.fr> wrote:

> Bonjour !
>
> Output of the command
>
> Cluster status of node rabbit at iut1r-srv-ops01-i01 ...
> Basics
> Cluster name: rabbit at iut1r-srv-ops01-i01.u-ga.fr
>
> Disk Nodes
> rabbit at iut1r-srv-ops01-i01
> rabbit at iut1r-srv-ops02-i01
>
> Running Nodes
> rabbit at iut1r-srv-ops01-i01
> rabbit at iut1r-srv-ops02-i01
>
> Versions
> rabbit at iut1r-srv-ops01-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2
> rabbit at iut1r-srv-ops02-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2
>
> Maintenance status
> Node: rabbit at iut1r-srv-ops01-i01, status: not under maintenance
> Node: rabbit at iut1r-srv-ops02-i01, status: not under maintenance
>
> Alarms
> (none)
>
> Network Partitions
> (none)
>
> Listeners
> Node: rabbit at iut1r-srv-ops01-i01, interface: [::], port: 15672, protocol:
> http, purpose: HTTP API
> Node: rabbit at iut1r-srv-ops01-i01, interface: [::], port: 15692, protocol:
> http/prometheus, purpose: Prometheus exporter API over HTTP
> Node: rabbit at iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 25672,
> protocol: clustering, purpose: inter-node and CLI tool communication
> Node: rabbit at iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 5672,
> protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
> Node: rabbit at iut1r-srv-ops02-i01, interface: [::], port: 15672, protocol:
> http, purpose: HTTP API
> Node: rabbit at iut1r-srv-ops02-i01, interface: [::], port: 15692, protocol:
> http/prometheus, purpose: Prometheus exporter API over HTTP
> Node: rabbit at iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 25672,
> protocol: clustering, purpose: inter-node and CLI tool communication
> Node: rabbit at iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 5672,
> protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>
> Feature flags
> Flag: drop_unroutable_metric, state: enabled
> Flag: empty_basic_get_metric, state: enabled
> Flag: implicit_default_bindings, state: enabled
> Flag: maintenance_mode_status, state: enabled
> Flag: quorum_queue, state: enabled
> Flag: stream_queue, state: enabled
> Flag: user_limits, state: enabled
> Flag: virtual_host_metadata, state: enabled
>
> So… nothing strange for me.
>
> All containers are healthy nom (after delete rabbitmq and rebuild
> rabbitmq).
>
>
> in addition to dhcp, communications on the network do not work.
> If I create an instance, it has no ip address by dhcp.
> If I give her a static ip, she can't reach the router.
> If I create another instance, with another static ip, they don't
> communicate with each other.
> And they can't ping the router (or routers, I put 2, 1 on each of my 2
> external networks)
>
> There are some errors in rabbitmq…..log:
> 2022-11-12 08:53:37.155542+01:00 [error] <0.16179.2> missed heartbeats
> from client, timeout: 60s
> 2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> closing AMQP
> connection <0.17357.2> (10.0.5.109:37532 -> 10.0.5.109:5672 -
> mod_wsgi:43:e50d8e69-7c76-4198-877c-c807e0a180d8):
> 2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> missed heartbeats
> from client, timeout: 60s
>
> There are some errors also in neutron-l3-agent.log
> 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task     message =
> self.waiters.get(msg_id, timeout=timeout)
> 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task   File
> "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py",
> line 445, in get
> 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task     'to
> message ID %s' % msg_id)
> 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID 297cacfadd764562bf09a1c5daf61958
>
> Also in neutron-dhcp-agent.log
> 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent     message =
> self.waiters.get(msg_id, timeout=timeout)
> 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent   File
> "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py",
> line 445, in get
> 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent     'to message
> ID %s' % msg_id)
> 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID 6f1d9d0c51ac4d89b9c889ca273f40a0
>
> A lot of errors in neutron-metadata.log
> 2022-11-11 22:01:44.152 43 ERROR oslo.messaging._drivers.impl_rabbit [-]
> [d7902e2c-eba9-40e4-b872-40e7ba7a39ec] AMQP server on 10.0.5.109:5672 is
> unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1
> seconds.: amqp.exceptions.RecoverableConnectionError:
> <RecoverableConnectionError: unknown error>
> 2022-11-11 22:01:44.226 7 ERROR oslo.messaging._drivers.impl_rabbit [-]
> [028872e4-fcd1-4de5-b20c-8c5541e3c77f] AMQP server on 10.0.5.109:5672 is
> unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1
> seconds.: amqp.exceptions.RecoverableConnectionError:
> <RecoverableConnectionError: unknown error>
>
>
> timeout …. waiting…. unreachable…. connectionerror….
>
> Something is wrong, but I think it’s very difficult to find the problem.
> To difficult for me.
> « nc -v » works.
>

There are several things that can cause issues with Rabbit, or with
services sending messages. Rabbit itself is not always to blame. Things
I've seen cause issues before include:

1) Time not being in sync on all systems (covered that earlier)
2) DNS (it's always DNS, right?)
3) Networking issues like mismatched MTU
4) Nova being configured for a Ceph backend, but timing out trying to talk
to the cluster (messages would expire while Nova waited on it)


>
I do not know what to do.
> I can lose all data (networks, instances, volumes, etc). I can start again
> on a new config
> Do I do it with kolla-ansible -i multinode destroy?
>
> Yeah, just do kolla-ansible -i multinode destroy after backing up your
kolla configs.

Before switching to Yoga, I had a cluster under Xena. I kept my
> configuration and a venv (python) with koll-ansible for Xena.
> Am I going back to this version? How without doing stupid things?
>
> I can't see any good reason to roll back to Xena. Yoga should be fine.

Changing should be as simple as swapping your VENV, and using your Xena
globals.yml, passwords.yml, inventory, and any other custom configs you had
for that version.


> Thanks a lot.
>
> Franck VEDEL
>
>
>
> Le 11 nov. 2022 à 23:33, Laurent Dumont <laurentfdumont at gmail.com> a
> écrit :
>
> docker exec -it rabbitmq rabbitmqctl cluster_status
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221112/c13fffd0/attachment.htm>


More information about the openstack-discuss mailing list