Bonjour !

Output of the command

Cluster status of node rabbit@iut1r-srv-ops01-i01 ...
Basics
Cluster name: rabbit@iut1r-srv-ops01-i01.u-ga.fr

Disk Nodes
rabbit@iut1r-srv-ops01-i01
rabbit@iut1r-srv-ops02-i01

Running Nodes
rabbit@iut1r-srv-ops01-i01
rabbit@iut1r-srv-ops02-i01

Versions
rabbit@iut1r-srv-ops01-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2
rabbit@iut1r-srv-ops02-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2

Maintenance status
Node: rabbit@iut1r-srv-ops01-i01, status: not under maintenance
Node: rabbit@iut1r-srv-ops02-i01, status: not under maintenance

Alarms
(none)

Network Partitions
(none)

Listeners
Node: rabbit@iut1r-srv-ops01-i01, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@iut1r-srv-ops01-i01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@iut1r-srv-ops02-i01, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@iut1r-srv-ops02-i01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

So… nothing strange for me.

All containers are healthy nom (after delete rabbitmq and rebuild rabbitmq).


in addition to dhcp, communications on the network do not work.
If I create an instance, it has no ip address by dhcp.
If I give her a static ip, she can't reach the router.
If I create another instance, with another static ip, they don't communicate with each other.
And they can't ping the router (or routers, I put 2, 1 on each of my 2 external networks)

There are some errors in rabbitmq…..log:
2022-11-12 08:53:37.155542+01:00 [error] <0.16179.2> missed heartbeats from client, timeout: 60s
2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> closing AMQP connection <0.17357.2> (10.0.5.109:37532 -> 10.0.5.109:5672 - mod_wsgi:43:e50d8e69-7c76-4198-877c-c807e0a180d8):
2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> missed heartbeats from client, timeout: 60s

There are some errors also in neutron-l3-agent.log
2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task     message = self.waiters.get(msg_id, timeout=timeout)
2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task   File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 445, in get
2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task     'to message ID %s' % msg_id)
2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 297cacfadd764562bf09a1c5daf61958

Also in neutron-dhcp-agent.log
2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 445, in get
2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent     'to message ID %s' % msg_id)
2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6f1d9d0c51ac4d89b9c889ca273f40a0

A lot of errors in neutron-metadata.log
2022-11-11 22:01:44.152 43 ERROR oslo.messaging._drivers.impl_rabbit [-] [d7902e2c-eba9-40e4-b872-40e7ba7a39ec] AMQP server on 10.0.5.109:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>
2022-11-11 22:01:44.226 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [028872e4-fcd1-4de5-b20c-8c5541e3c77f] AMQP server on 10.0.5.109:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>


timeout …. waiting…. unreachable…. connectionerror…. 

Something is wrong, but I think it’s very difficult to find the problem. To difficult for me.
« nc -v » works.

I do not know what to do.
I can lose all data (networks, instances, volumes, etc). I can start again on a new config
Do I do it with kolla-ansible -i multinode destroy?

Before switching to Yoga, I had a cluster under Xena. I kept my configuration and a venv (python) with koll-ansible for Xena.
Am I going back to this version? How without doing stupid things?


Thanks a lot.

Franck VEDEL



Le 11 nov. 2022 à 23:33, Laurent Dumont <laurentfdumont@gmail.com> a écrit :

docker exec -it rabbitmq rabbitmqctl cluster_status