Open Stack

Thu Jul 9 15:35:57 UTC 2020

Hi together,

currently we exploring how we can reboot a compute node without any interruptions for the networking stack.
We run Openstack Train with ml2 driver Linux bridge and dnsmasq for DHCP and internal DNS.
The DHCP setup runs as high availability setup with 3 replicas.
During our tests we identified the following challenges:

1.)

If we reboot the machine without doing anything on the network layer all ports will be rescheduled.
Also the networks will be removed from the (dead) agent and will be reassigned to another agent.
But for each reboot we have some leftover ports with the device-id "reserved_dhcp_port".
These ports can safely deleted (we haven't figured out where the issue in the neutron code is).

2.)

If we disable the network agent like described here: https://docs.openstack.org/neutron/train/admin/config-dhcp-ha.html
and then remove the disabled agent from all networks we have an even worse behaviour since the neutron scheduler doesn't reschedule the network to a different agent.

So what is the correct way to ensure that the reboot of a node has no (or only small) interruptions to the networking service?
The current issue is that if we remove one agent we might remove the port that is the first entry in the clients (VM's) resolv.conf which means that each request will be delayed by the default timeout.

And is there any option to "migrate" a network from one agent to another?

Thanks in advance,

Johannes Scheuermann

Open Stack

Neutron Agent Migration

OpenStack

Community

Documentation

Branding & Legal