[openstack-dev] [kolla][nova] Safe guest shutdowns with kolla?

Clint Byrum clint at fewbar.com
Thu Jul 12 20:02:59 UTC 2018


Greetings! We've been deploying with Kolla on CentOS 7 now for a while, and
we've recently noticed a rather troubling behavior when we shutdown
hypervisors.

Somewhere between systemd and libvirt's systemd-machined integration,
we see that guests get killed aggressively by SIGTERM'ing all of the
qemu-kvm processes. This seems to happen because they are scoped into
machine.slice, but systemd-machined is killed which drops those scopes
and thus results in killing off the machines.

In the past, we've used the libvirt-guests service when our libvirt was
running outside of containers. This worked splendidly, as we could
have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
interrupting any running processes. But this service isn't available on
the host OS, as it won't be able to talk to libvirt inside the container.

The solution I've come up with for now is this:

[Unit]
Description=Manage libvirt guests in kolla safely
After=docker.service systemd-machined.service
Requires=docker.service

[Install]
WantedBy=sysinit.target

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStopSec=400
ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh start
ExecStart=/usr/bin/docker start nova_compute
ExecStop=/usr/bin/docker stop nova_compute
ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh shutdown

This doesn't seem to work, though I'm still trying to work out
the ordering and such. It should ensure that before we stop the
systemd-machined and destroy all of its scopes (thus, killing all the
vms), we run the libvirt-guests.sh script to try and shut them down. The
TimeoutStopSec=400 is because the script itself waits 300 seconds for any
VM that refuses to shutdown cleanly, so this gives it a chance to wait
for at least one of those. This is an imperfect solution but it allows us
to move forward after having made a reasonable attempt at clean shutdowns.

Anyway, just wondering if anybody else using kolla-ansible or kolla
containers in general have run into this problem, and whether or not
there are better/known solutions.

Thanks!



More information about the OpenStack-dev mailing list