Bonjour ! Output of the command Cluster status of node rabbit@iut1r-srv-ops01-i01 ... Basics Cluster name: rabbit@iut1r-srv-ops01-i01.u-ga.fr Disk Nodes rabbit@iut1r-srv-ops01-i01 rabbit@iut1r-srv-ops02-i01 Running Nodes rabbit@iut1r-srv-ops01-i01 rabbit@iut1r-srv-ops02-i01 Versions rabbit@iut1r-srv-ops01-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2 rabbit@iut1r-srv-ops02-i01: RabbitMQ 3.9.20 on Erlang 24.3.4.2 Maintenance status Node: rabbit@iut1r-srv-ops01-i01, status: not under maintenance Node: rabbit@iut1r-srv-ops02-i01, status: not under maintenance Alarms (none) Network Partitions (none) Listeners Node: rabbit@iut1r-srv-ops01-i01, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@iut1r-srv-ops01-i01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP Node: rabbit@iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@iut1r-srv-ops01-i01, interface: 10.0.5.109, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit@iut1r-srv-ops02-i01, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@iut1r-srv-ops02-i01, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP Node: rabbit@iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@iut1r-srv-ops02-i01, interface: 10.0.5.110, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Feature flags Flag: drop_unroutable_metric, state: enabled Flag: empty_basic_get_metric, state: enabled Flag: implicit_default_bindings, state: enabled Flag: maintenance_mode_status, state: enabled Flag: quorum_queue, state: enabled Flag: stream_queue, state: enabled Flag: user_limits, state: enabled Flag: virtual_host_metadata, state: enabled So… nothing strange for me. All containers are healthy nom (after delete rabbitmq and rebuild rabbitmq). in addition to dhcp, communications on the network do not work. If I create an instance, it has no ip address by dhcp. If I give her a static ip, she can't reach the router. If I create another instance, with another static ip, they don't communicate with each other. And they can't ping the router (or routers, I put 2, 1 on each of my 2 external networks) There are some errors in rabbitmq…..log: 2022-11-12 08:53:37.155542+01:00 [error] <0.16179.2> missed heartbeats from client, timeout: 60s 2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> closing AMQP connection <0.17357.2> (10.0.5.109:37532 -> 10.0.5.109:5672 - mod_wsgi:43:e50d8e69-7c76-4198-877c-c807e0a180d8): 2022-11-12 08:54:54.026480+01:00 [error] <0.17357.2> missed heartbeats from client, timeout: 60s There are some errors also in neutron-l3-agent.log 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task message = self.waiters.get(msg_id, timeout=timeout) 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 445, in get 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task 'to message ID %s' % msg_id) 2022-11-11 22:04:42.512 37 ERROR oslo_service.periodic_task oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 297cacfadd764562bf09a1c5daf61958 Also in neutron-dhcp-agent.log 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout) 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 445, in get 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent 'to message ID %s' % msg_id) 2022-11-11 22:04:44.854 7 ERROR neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6f1d9d0c51ac4d89b9c889ca273f40a0 A lot of errors in neutron-metadata.log 2022-11-11 22:01:44.152 43 ERROR oslo.messaging._drivers.impl_rabbit [-] [d7902e2c-eba9-40e4-b872-40e7ba7a39ec] AMQP server on 10.0.5.109:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error> 2022-11-11 22:01:44.226 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [028872e4-fcd1-4de5-b20c-8c5541e3c77f] AMQP server on 10.0.5.109:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error> timeout …. waiting…. unreachable…. connectionerror…. Something is wrong, but I think it’s very difficult to find the problem. To difficult for me. « nc -v » works. I do not know what to do. I can lose all data (networks, instances, volumes, etc). I can start again on a new config Do I do it with kolla-ansible -i multinode destroy? Before switching to Yoga, I had a cluster under Xena. I kept my configuration and a venv (python) with koll-ansible for Xena. Am I going back to this version? How without doing stupid things? Thanks a lot. Franck VEDEL
Le 11 nov. 2022 à 23:33, Laurent Dumont <laurentfdumont@gmail.com> a écrit :
docker exec -it rabbitmq rabbitmqctl cluster_status