Re: [Kolla-ansible][epoxy]nova problems at restart

18 Jul 2025

      It Works !!!!
Thanks a lot Sean . 
You helped me so much with this problem. Thank you so much.
I will test again all the services I want (eg vpnaas, designate) but I am optimistic!!

Franck
...
Le 18 juil. 2025 à 19:08, Franck VEDEL (UGA) <franck.vedel@univ-grenoble-alpes.fr> a écrit :
Thanks again.
Now rabbitmq is the problem.
TASK [rabbitmq : Waiting for rabbitmq to start] ********************************
fatal: [ordi1]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "rabbitmq", "rabbitmqctl", "wait", "--timeout", "60", "/var/lib/rabbitmq/mnesia/rabbitmq.pid"], "delta": "0:05:10.503808", "end": "2025-07-18 18:33:28.522700", "msg": "non-zero return code", "rc": 69, "start": "2025-07-18 18:28:18.018892", "stderr": "Error:\nrabbit_is_not_running", "stderr_lines": ["Error:", "rabbit_is_not_running"], "stdout": "Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear\npid is 1668\nWaiting for erlang distribution on node 'rabbit@ordi1' while OS process '1668' is running\nWaiting for applications 'rabbit_and_plugins' to start on node 'rabbit@ordi1'", "stdout_lines": ["Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear", "pid is 1668", "Waiting for erlang distribution on node 'rabbit@ordi1' while OS process '1668' is running", "Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@ordi1' »]}
Thanks again.
(venv2) user1@ordi1:~/OPENSTACK/INSTALL$ docker ps 
CONTAINER ID   IMAGE                                                                       COMMAND                  CREATED          STATUS                     PORTS     NAMES
fd06eabb8aab   quay.io/openstack.kolla/rabbitmq:2025.1-ubuntu-noble                        "dumb-init --single-…"   35 minutes ago   Up 2 minutes (unhealthy)             rabbitmq
(venv2) user1@ordi1:~/OPENSTACK/INSTALL$ docker logs rabbitmq
….
BOOT FAILED
===========
Timeout contacting cluster nodes: [rabbit@ordi4].
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
DIAGNOSTICS
===========
attempted to contact: [rabbit@ordi4]
rabbit@ordi4:
  * unable to connect to epmd (port 4369) on ordi4: address (cannot connect to host/port)
File multinode:
[control]
# These hostname must be resolvable from your deployment host
ordi1 become=true ansible_become_password=XXXXXX
ordi4 become=true ansible_become_password=XXXXXX
I don't really know what to do, computer1 and computer4 communicate
First time I have problems with rabbitmq.
Franck
...
Le 18 juil. 2025 à 17:42, Sean Mooney <smooney@redhat.com> a écrit :
On 18/07/2025 16:20, Franck VEDEL (UGA) wrote:
...
Thansk a lot for your help.
I'm afraid of making a mistake...
Are the different operations the following:
kolla-ansible -i multinode stop
pip3 install --upgrade 'ansible-core>=2.17,<2.18.99' kolla-ansible==20.1.0
kolla-ansible install-deps
kolla-ansible -i multinode pull
kolla-ansible -i multinode deploy
yes those are the steps i was suggesting.
however if your not confident and don't have a test cluster to try this on before i would wait
for the kolla-ansible team or some operators that run a kolla cloud to chime in.
...
Will I get the templates and images back?
pull will cause new docker images to be pulled to all the nodes and they will be tagged with latest ectra.
if you look at your  /etc/kolla/global.yaml
the tag that is uses will likely be something like this 2025.1-ubuntu-noble
depending on what you set
# Valid options are ['centos', 'debian', 'rocky', 'ubuntu']
#kolla_base_distro: "rocky"
# Do not override this unless you know what you are doing.
#openstack_release: "master"
too https://github.com/openstack/kolla-ansible/blob/stable/2025.1/etc/kolla/glob...
you local tagged images will be update to point to the latest version
https://quay.io/repository/openstack.kolla/nova-compute?tab=tags
when you do deploy it will go though the same steps in the same order as your initial deployment
and will update all the config and deploy the new contienrs using the images you pulled.
this is intended to be non destructive in kolla.
any vms ectra or glance images or cidner volumes should still exist after.
you do not technically need to do stop first by the way.
my main recommendation was to just make sure you are running the latest epoxy version of kolla-ansible and the latest epoxy version of openstack containers
before debugging the placement issue to much further.
you could instead just do
kolla-ansible reconfigure -I multinode -t placement
to see if that will fix the placement container issue and then see of that allows nova to start.
...
Franck
...
Le 18 juil. 2025 à 16:50, Sean Mooney <smooney@redhat.com> a écrit :
so just a suggestion
i would basiclly try turning it off and on a agin.
so i think the problem is that you limit the reconfig to nova but the srrro seam to indicate they placement is not runing or it not a new enough verions.
so in epoxy i belive kolla has a stop command
so i woudl be tempted to do a stop, pull the latest version of the kolla images with the pull action and then do anohter deploy rather then reconfigure.
recornigure at least in the past is actully just ment to update the config file and restart the contiers in place.
deploy does that but it can also update to newever version fo the images if you pull them first.
i have not worked on kolla in a very long time nad i rearly update my home deployment so the kolla team may have a better suggestion but
that would be what i tired if it was my home cluster.
note that when doing deploy i woudl allow it to run for all service
his is effectivly jsut doing a mindor update to latest stable release of epoxy.
i would also make sure your kolla-ansible is using the latest epoxy version too
incase your are hitting a fix know issue. thre is now a 20.1.0 aviable.
if your using it form git you can obviously just do a git pull but done forget to make sure you updated the deps
pip3 install --upgrade 'ansible-core>=2.17,<2.18.99' kolla-ansible==20.1.0 kolla-ansible install-deps
i think the root of your problem is with the fact palcement is not starting properly adn returnningn 500s
the  python application found error i think is because of setuptools 80 and the the way the wsig script used
to be generate becore we added a pyrpoject.toml
so im hoping using the latest images will have that resolved.
my home clustier is still on caracal and i need to upgrade it soon but i cant confirm if that will solve the issue but
it likely that it will.