Open Stack

Thu Nov 10 18:09:43 UTC 2022

Hi,

this sounds very similar to something I experienced a couple of times  
this year. In a HA cloud with two control nodes (the third joined just  
recently) when one node was shut down (accidentally) I saw basically  
the same effects you're describing. I could create new networks and  
instances were started successfully and also got their IPs via DHCP  
while existing VMs didn't properly work (at least the dhcp part for  
self-service networks). I'm still not sure what exactly the root cause  
is as I can't reproduce it in my test lab, and retrying it in a  
production cluster is not a good idea. ;-)
I got things to work, but it's still unclear what exactly it was. It's  
possible that you could see hints in the neutron logs that something's  
not right, I don't recall the exact message but it was something like  
"dhcp agent doesn't work because the server is overloaded". By the  
way, what is the number of dhcp agents per network you have in  
neutron.conf?
Briefly, here's what I did (at that time with 2 control nodes):
- put the pacemaker cluster into maintenance mode so I could stop and  
start services manually
- stopped all services except rabbitmq and galera
- made sure all services (like neutron) were actually "dead", so no  
left over processes
- started apache and haproxy on one node only so all requests would land there
- started one service after another manually and watched the logs
- now the dhcp agent started successfully and logged
- started the services on the remaining control node and everything was stable
- the cluster then recovered

I don't know if that helps in any way, but I thought I'd share. By the  
way, we don't use kolla so I can't really comment that part.

Regards,
Eugen

Zitat von Franck VEDEL <franck.vedel at univ-grenoble-alpes.fr>:

> Hello,
> after a restart of my cluster (and some problems...), I have one  
> last problem with the VMs already present (before the restart).
> They all work fine….They all work, console access OK, network topology ok…
>
> But they can no longer communicate on the network, they do not  
> obtain IP addresses by dhcp. Yet everything seems to be working.
> If I detach the interface, I create a new interface, it doesn't  
> work. I cannot reach the routers. I cannot communicate with an  
> instance on the same network.
> On the other hand, if I create a new instance, no problem, it works  
> and can join the other instances and its router.
> Is there a way to fix this? The problem is where? in the database?
> Thank you in advance for your help.
>
> Franck VEDEL

Open Stack

[kolla-ansible]Network Problem after server reboot

OpenStack

Community

Documentation

Branding & Legal