[kolla-ansible]Network Problem after server reboot
Eugen Block
eblock at nde.ag
Thu Nov 10 18:09:43 UTC 2022
Hi,
this sounds very similar to something I experienced a couple of times
this year. In a HA cloud with two control nodes (the third joined just
recently) when one node was shut down (accidentally) I saw basically
the same effects you're describing. I could create new networks and
instances were started successfully and also got their IPs via DHCP
while existing VMs didn't properly work (at least the dhcp part for
self-service networks). I'm still not sure what exactly the root cause
is as I can't reproduce it in my test lab, and retrying it in a
production cluster is not a good idea. ;-)
I got things to work, but it's still unclear what exactly it was. It's
possible that you could see hints in the neutron logs that something's
not right, I don't recall the exact message but it was something like
"dhcp agent doesn't work because the server is overloaded". By the
way, what is the number of dhcp agents per network you have in
neutron.conf?
Briefly, here's what I did (at that time with 2 control nodes):
- put the pacemaker cluster into maintenance mode so I could stop and
start services manually
- stopped all services except rabbitmq and galera
- made sure all services (like neutron) were actually "dead", so no
left over processes
- started apache and haproxy on one node only so all requests would land there
- started one service after another manually and watched the logs
- now the dhcp agent started successfully and logged
- started the services on the remaining control node and everything was stable
- the cluster then recovered
I don't know if that helps in any way, but I thought I'd share. By the
way, we don't use kolla so I can't really comment that part.
Regards,
Eugen
Zitat von Franck VEDEL <franck.vedel at univ-grenoble-alpes.fr>:
> Hello,
> after a restart of my cluster (and some problems...), I have one
> last problem with the VMs already present (before the restart).
> They all work fine….They all work, console access OK, network topology ok…
>
> But they can no longer communicate on the network, they do not
> obtain IP addresses by dhcp. Yet everything seems to be working.
> If I detach the interface, I create a new interface, it doesn't
> work. I cannot reach the routers. I cannot communicate with an
> instance on the same network.
> On the other hand, if I create a new instance, no problem, it works
> and can join the other instances and its router.
> Is there a way to fix this? The problem is where? in the database?
> Thank you in advance for your help.
>
> Franck VEDEL
More information about the openstack-discuss
mailing list