[openstack-dev] [kolla][nova][tripleo] Safe guest shutdowns with kolla?
aschultz at redhat.com
Fri Jul 13 13:50:01 UTC 2018
On Fri, Jul 13, 2018 at 1:54 AM, Bogdan Dobrelya <bdobreli at redhat.com> wrote:
> [Added tripleo]
> It would be nice to have this situation verified/improved for containerized
> libvirt for compute nodes deployed with TripleO as well.
> On 7/12/18 11:02 PM, Clint Byrum wrote:
>> Greetings! We've been deploying with Kolla on CentOS 7 now for a while,
>> we've recently noticed a rather troubling behavior when we shutdown
>> Somewhere between systemd and libvirt's systemd-machined integration,
>> we see that guests get killed aggressively by SIGTERM'ing all of the
>> qemu-kvm processes. This seems to happen because they are scoped into
>> machine.slice, but systemd-machined is killed which drops those scopes
>> and thus results in killing off the machines.
> So far we had observed the similar  happening, but to systemd vs
> containers managed by docker-daemon (dockerd).
>  https://bugs.launchpad.net/tripleo/+bug/1778913
>> In the past, we've used the libvirt-guests service when our libvirt was
>> running outside of containers. This worked splendidly, as we could
>> have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
>> interrupting any running processes. But this service isn't available on
>> the host OS, as it won't be able to talk to libvirt inside the container.
>> The solution I've come up with for now is this:
>> Description=Manage libvirt guests in kolla safely
>> After=docker.service systemd-machined.service
>> ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh
>> ExecStart=/usr/bin/docker start nova_compute
>> ExecStop=/usr/bin/docker stop nova_compute
>> ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh
>> This doesn't seem to work, though I'm still trying to work out
>> the ordering and such. It should ensure that before we stop the
>> systemd-machined and destroy all of its scopes (thus, killing all the
>> vms), we run the libvirt-guests.sh script to try and shut them down. The
>> TimeoutStopSec=400 is because the script itself waits 300 seconds for any
>> VM that refuses to shutdown cleanly, so this gives it a chance to wait
>> for at least one of those. This is an imperfect solution but it allows us
>> to move forward after having made a reasonable attempt at clean shutdowns.
>> Anyway, just wondering if anybody else using kolla-ansible or kolla
>> containers in general have run into this problem, and whether or not
>> there are better/known solutions.
> As I noted above, I think the issue may be valid for TripleO as well.
I think https://review.openstack.org/#/c/580351/ is trying to address this.
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev