Thanks again.
Now rabbitmq is the problem.

TASK [rabbitmq : Waiting for rabbitmq to start] ********************************

fatal: [ordi1]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "rabbitmq", "rabbitmqctl", "wait", "--timeout", "60", "/var/lib/rabbitmq/mnesia/rabbitmq.pid"], "delta": "0:05:10.503808", "end": "2025-07-18 18:33:28.522700", "msg": "non-zero return code", "rc": 69, "start": "2025-07-18 18:28:18.018892", "stderr": "Error:\nrabbit_is_not_running", "stderr_lines": ["Error:", "rabbit_is_not_running"], "stdout": "Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear\npid is 1668\nWaiting for erlang distribution on node 'rabbit@ordi1' while OS process '1668' is running\nWaiting for applications 'rabbit_and_plugins' to start on node 'rabbit@ordi1'", "stdout_lines": ["Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear", "pid is 1668", "Waiting for erlang distribution on node 'rabbit@ordi1' while OS process '1668' is running", "Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@ordi1' »]}



Thanks again.

(venv2) user1@ordi1:~/OPENSTACK/INSTALL$ docker ps 

CONTAINER ID   IMAGE                                                                       COMMAND                  CREATED          STATUS                     PORTS     NAMES

fd06eabb8aab   quay.io/openstack.kolla/rabbitmq:2025.1-ubuntu-noble                        "dumb-init --single-…"   35 minutes ago   Up 2 minutes (unhealthy)             rabbitmq



(venv2) user1@ordi1:~/OPENSTACK/INSTALL$ docker logs rabbitmq

….

BOOT FAILED

===========

Timeout contacting cluster nodes: [rabbit@ordi4].


BACKGROUND

==========


This cluster node was shut down while other nodes were still running.

To avoid losing data, you should start the other nodes first, then

start this one. To force this node to start, first invoke

"rabbitmqctl force_boot". If you do so, any changes made on other

cluster nodes after this one was shut down may be lost.


DIAGNOSTICS

===========


attempted to contact: [rabbit@ordi4]


rabbit@ordi4:

  * unable to connect to epmd (port 4369) on ordi4: address (cannot connect to host/port)




File multinode:

[control]

# These hostname must be resolvable from your deployment host

ordi1 become=true ansible_become_password=XXXXXX

ordi4 become=true ansible_become_password=XXXXXX



I don't really know what to do, computer1 and computer4 communicate

First time I have problems with rabbitmq.


Franck 

Le 18 juil. 2025 à 17:42, Sean Mooney <smooney@redhat.com> a écrit :


On 18/07/2025 16:20, Franck VEDEL (UGA) wrote:
Thansk a lot for your help.


I'm afraid of making a mistake...

Are the different operations the following:

kolla-ansible -i multinode stop
pip3 install --upgrade 'ansible-core>=2.17,<2.18.99' kolla-ansible==20.1.0
kolla-ansible install-deps
kolla-ansible -i multinode pull
kolla-ansible -i multinode deploy

yes those are the steps i was suggesting.

however if your not confident and don't have a test cluster to try this on before i would wait
for the kolla-ansible team or some operators that run a kolla cloud to chime in.



Will I get the templates and images back?

pull will cause new docker images to be pulled to all the nodes and they will be tagged with latest ectra.

if you look at your  /etc/kolla/global.yaml

the tag that is uses will likely be something like this 2025.1-ubuntu-noble

depending on what you set


# Valid options are ['centos', 'debian', 'rocky', 'ubuntu']
#kolla_base_distro: "rocky"

# Do not override this unless you know what you are doing.
#openstack_release: "master"

too https://github.com/openstack/kolla-ansible/blob/stable/2025.1/etc/kolla/globals.yml#L45-L49

you local tagged images will be update to point to the latest version

https://quay.io/repository/openstack.kolla/nova-compute?tab=tags

when you do deploy it will go though the same steps in the same order as your initial deployment

and will update all the config and deploy the new contienrs using the images you pulled.

this is intended to be non destructive in kolla.

any vms ectra or glance images or cidner volumes should still exist after.

you do not technically need to do stop first by the way.

my main recommendation was to just make sure you are running the latest epoxy version of kolla-ansible and the latest epoxy version of openstack containers

before debugging the placement issue to much further.


you could instead just do

kolla-ansible reconfigure -I multinode -t placement

to see if that will fix the placement container issue and then see of that allows nova to start.


Franck

Le 18 juil. 2025 à 16:50, Sean Mooney <smooney@redhat.com> a écrit :

so just a suggestion


i would basiclly try turning it off and on a agin.

so i think the problem is that you limit the reconfig to nova but the srrro seam to indicate they placement is not runing or it not a new enough verions.

so in epoxy i belive kolla has a stop command

so i woudl be tempted to do a stop, pull the latest version of the kolla images with the pull action and then do anohter deploy rather then reconfigure.

recornigure at least in the past is actully just ment to update the config file and restart the contiers in place.

deploy does that but it can also update to newever version fo the images if you pull them first.


i have not worked on kolla in a very long time nad i rearly update my home deployment so the kolla team may have a better suggestion but

that would be what i tired if it was my home cluster.

note that when doing deploy i woudl allow it to run for all service
his is effectivly jsut doing a mindor update to latest stable release of epoxy.

i would also make sure your kolla-ansible is using the latest epoxy version too
incase your are hitting a fix know issue. thre is now a 20.1.0 aviable.

if your using it form git you can obviously just do a git pull but done forget to make sure you updated the deps

pip3 install --upgrade 'ansible-core>=2.17,<2.18.99' kolla-ansible==20.1.0 kolla-ansible install-deps

i think the root of your problem is with the fact palcement is not starting properly adn returnningn 500s

the  python application found error i think is because of setuptools 80 and the the way the wsig script used
to be generate becore we added a pyrpoject.toml

so im hoping using the latest images will have that resolved.

my home clustier is still on caracal and i need to upgrade it soon but i cant confirm if that will solve the issue but
it likely that it will.