[openstack-dev] [kolla][nova][tripleo] Safe guest shutdowns with kolla?
bdobreli at redhat.com
Fri Jul 13 07:54:17 UTC 2018
It would be nice to have this situation verified/improved for
containerized libvirt for compute nodes deployed with TripleO as well.
On 7/12/18 11:02 PM, Clint Byrum wrote:
> Greetings! We've been deploying with Kolla on CentOS 7 now for a while, and
> we've recently noticed a rather troubling behavior when we shutdown
> Somewhere between systemd and libvirt's systemd-machined integration,
> we see that guests get killed aggressively by SIGTERM'ing all of the
> qemu-kvm processes. This seems to happen because they are scoped into
> machine.slice, but systemd-machined is killed which drops those scopes
> and thus results in killing off the machines.
So far we had observed the similar  happening, but to systemd vs
containers managed by docker-daemon (dockerd).
> In the past, we've used the libvirt-guests service when our libvirt was
> running outside of containers. This worked splendidly, as we could
> have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
> interrupting any running processes. But this service isn't available on
> the host OS, as it won't be able to talk to libvirt inside the container.
> The solution I've come up with for now is this:
> Description=Manage libvirt guests in kolla safely
> After=docker.service systemd-machined.service
> ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh start
> ExecStart=/usr/bin/docker start nova_compute
> ExecStop=/usr/bin/docker stop nova_compute
> ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh shutdown
> This doesn't seem to work, though I'm still trying to work out
> the ordering and such. It should ensure that before we stop the
> systemd-machined and destroy all of its scopes (thus, killing all the
> vms), we run the libvirt-guests.sh script to try and shut them down. The
> TimeoutStopSec=400 is because the script itself waits 300 seconds for any
> VM that refuses to shutdown cleanly, so this gives it a chance to wait
> for at least one of those. This is an imperfect solution but it allows us
> to move forward after having made a reasonable attempt at clean shutdowns.
> Anyway, just wondering if anybody else using kolla-ansible or kolla
> containers in general have run into this problem, and whether or not
> there are better/known solutions.
As I noted above, I think the issue may be valid for TripleO as well.
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev