On Fri, Nov 11, 2022 at 3:05 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr> wrote:

Thanks for your help, really.
My cluster: 2 controllers nodes, OVS, L3-HA.
All nodes had to be rebooted
All is working for example with external networks (so dhcp on external networks).
There are no dead containers, all seems ok.

I try to create a new instance on a L3 network. No ERROR in neutron*.log.
The only error is nova-api.log:

Example:
2022-11-11 08:45:54.452 42 ERROR oslo.messaging._drivers.impl_rabbit [-] [8b6fd776-f096-4c8a-927e-88225a3adb43] AMQP server on 10.0.5.109:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>

But on the first node (10.0.5.109 on the internal network) « netstat -atnp |wc-l » ———>>> 505 connections

Sounds to me like Rabbit is broken. This could also be an issue with NTP which I asked about earlier. Did you confirm your systems are all correctly synced to the same time source?

You can check the status of rabbit on each control node with:

docker exec -it rabbitmq rabbitmqctl cluster_status

Output should show the same on both of your controllers. If not, restart your rabbit containers. If they won't come back properly, you could destroy and redeploy just those two containers l

On both controllers do:

docker rm rabbitmq

docker volume rm rabbitmq

Then kolla-ansible --tags rabbitmq deploy

So…. if I backup /etc/kolla, my glance images, my configuration files…
if a do « koll-ansible destroy », is next step « kolla-ansible bootstraps…. » and preaches, and deploy,
or directly deploy ?

What’s the difference with cleanup-containers ?

You don't need to bootstrap again. That just installs prerequisites which won't get removed from the destroy. Just go right to doing kolla-ansible deploy again. Do remember this will give you a brand new Openstack with nothing preserved from before.

cleanup-containers alone may leave behind some docker tweaks that Neutron needs. It probably doesn't matter if you're going to just redeploy the same configuration though so go ahead and use that instead.

-Erik

I use this openstack cluster for my students, I have a month to get it working again. I could reinstall everything (and change the operating system) but I don't have time for that.
So I can lose all the users data, if I have my glance images, my flavors, the configuration to hang the ldap, the certificates, I think it will be ok.

Franck VEDEL

Are you asking how to completely zero out your entire cluster and rebuild it? That seems a bit drastic.

kolla-ansible destroy will nuke everything. Take a backup of /etc/kolla (or wherever your inventory / globals.yml / passwords/yml is) first. Older versions removed some things there when running destroy and I can't recall when / if that changed.

How many controllers do you have?

Are you using OVS, OVN, or something else?

Are you using L3-HA? DVR?

Did all nodes have to be rebooted? If not, then which ones?

Have you confirmed there are no dead containers on any controllers? ( docker ps -a )

Have you looked in logs for ERROR messages? In particular: neutron-server.log, neutron-dhcp-agent.log, nova-api.log, and nova-compute.log ?

Strange things happen when time is out of sync. Verify all the nodes synced properly to an NTP server. Big symptom of this is 'openstack hypervisor list' will show hosts going up and down every few seconds.