[openstack-dev] [kolla][nova][tripleo] Safe guest shutdowns with kolla?
Bogdan Dobrelya
bdobreli at redhat.com
Fri Jul 13 07:54:17 UTC 2018
[Added tripleo]
It would be nice to have this situation verified/improved for
containerized libvirt for compute nodes deployed with TripleO as well.
On 7/12/18 11:02 PM, Clint Byrum wrote:
> Greetings! We've been deploying with Kolla on CentOS 7 now for a while, and
> we've recently noticed a rather troubling behavior when we shutdown
> hypervisors.
>
> Somewhere between systemd and libvirt's systemd-machined integration,
> we see that guests get killed aggressively by SIGTERM'ing all of the
> qemu-kvm processes. This seems to happen because they are scoped into
> machine.slice, but systemd-machined is killed which drops those scopes
> and thus results in killing off the machines.
So far we had observed the similar [0] happening, but to systemd vs
containers managed by docker-daemon (dockerd).
[0] https://bugs.launchpad.net/tripleo/+bug/1778913
>
> In the past, we've used the libvirt-guests service when our libvirt was
> running outside of containers. This worked splendidly, as we could
> have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
> interrupting any running processes. But this service isn't available on
> the host OS, as it won't be able to talk to libvirt inside the container.
>
> The solution I've come up with for now is this:
>
> [Unit]
> Description=Manage libvirt guests in kolla safely
> After=docker.service systemd-machined.service
> Requires=docker.service
>
> [Install]
> WantedBy=sysinit.target
>
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> TimeoutStopSec=400
> ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh start
> ExecStart=/usr/bin/docker start nova_compute
> ExecStop=/usr/bin/docker stop nova_compute
> ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh shutdown
>
> This doesn't seem to work, though I'm still trying to work out
> the ordering and such. It should ensure that before we stop the
> systemd-machined and destroy all of its scopes (thus, killing all the
> vms), we run the libvirt-guests.sh script to try and shut them down. The
> TimeoutStopSec=400 is because the script itself waits 300 seconds for any
> VM that refuses to shutdown cleanly, so this gives it a chance to wait
> for at least one of those. This is an imperfect solution but it allows us
> to move forward after having made a reasonable attempt at clean shutdowns.
>
> Anyway, just wondering if anybody else using kolla-ansible or kolla
> containers in general have run into this problem, and whether or not
> there are better/known solutions.
As I noted above, I think the issue may be valid for TripleO as well.
>
> Thanks!
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
More information about the OpenStack-dev
mailing list