[Openstack-operators] mitaka/xenial libvirt issues
Joe Topjian
joe at topjian.net
Thu Nov 23 15:32:53 UTC 2017
Hi all,
We're seeing some strange libvirt issues in an Ubuntu 16.04 environment.
It's running Mitaka, but I don't think this is a problem with OpenStack
itself.
We're in the process of upgrading this environment from Ubuntu 14.04 with
the Mitaka cloud archive to 16.04. Instances are being live migrated (NFS
share) to a new 16.04 compute node (fresh install), so there's a change
between libvirt versions (1.2.2 to 1.3.1). The problem we're seeing is only
happening on the 16.04/1.3.1 nodes.
We're getting occasional reports of instances not able to be snapshotted.
Upon investigation, the snapshot process quits early with a libvirt/qemu
lock timeout error. We then see that the instance's xml file has
disappeared from /etc/libvirt/qemu and must restart libvirt and hard-reboot
the instance to get things back to a normal state. Trying to live-migrate
the instance to another node causes the same thing to happen.
However, at some random time, either the snapshot or the migration will
work without error. I haven't been able to reproduce this issue on my own
and haven't been able to figure out the root cause by inspecting instances
reported to me.
One thing that has stood out is the length of time it takes for libvirt to
start. If I run "/etc/init.d/libvirt-bin start", it takes at least 5
minutes before a simple "virsh list" will work. The command will hang
otherwise. If I increase libvirt's logging level, I can see that during
this period of time, libvirt is working on iptables and ebtables (looks
like it's shelling out commands).
But if I run "libvirtd -l" straight on the command line, all of this
completes within 5 seconds (including all of the shelling out).
My initial thought is that systemd is doing some type of throttling between
the system and user slice, but I've tried comparing slice attributes and,
probably due to my lack of understanding of systemd, can't find anything to
prove this.
Is anyone else running into this problem? Does anyone know what might be
the cause?
Thanks,
Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20171123/2819463f/attachment.html>
More information about the OpenStack-operators
mailing list