[Openstack-operators] mitaka/xenial libvirt issues

Chris Sarginson csargiso at gmail.com
Thu Nov 23 18:03:54 UTC 2017


I think we may have pinned libvirt-bin as well, (1.3.1), but I can't
guarantee that, sorry - I would suggest its worth trying pinning both
initially.

Chris

On Thu, 23 Nov 2017 at 17:42 Joe Topjian <joe at topjian.net> wrote:

> Hi Chris,
>
> Thanks - we will definitely look into this. To confirm: did you also
> downgrade libvirt as well or was it all qemu?
>
> Thanks,
> Joe
>
> On Thu, Nov 23, 2017 at 9:16 AM, Chris Sarginson <csargiso at gmail.com>
> wrote:
>
>> We hit the same issue a while back (I suspect), which we seemed to
>> resolve by pinning QEMU and related packages at the following version (you
>> might need to hunt down the debs manually):
>>
>> 1:2.5+dfsg-5ubuntu10.5
>>
>> I'm certain there's a launchpad bug for Ubuntu qemu regarding this, but
>> don't have it to hand.
>>
>> Hope this helps,
>> Chris
>>
>> On Thu, 23 Nov 2017 at 15:33 Joe Topjian <joe at topjian.net> wrote:
>>
>>> Hi all,
>>>
>>> We're seeing some strange libvirt issues in an Ubuntu 16.04 environment.
>>> It's running Mitaka, but I don't think this is a problem with OpenStack
>>> itself.
>>>
>>> We're in the process of upgrading this environment from Ubuntu 14.04
>>> with the Mitaka cloud archive to 16.04. Instances are being live migrated
>>> (NFS share) to a new 16.04 compute node (fresh install), so there's a
>>> change between libvirt versions (1.2.2 to 1.3.1). The problem we're seeing
>>> is only happening on the 16.04/1.3.1 nodes.
>>>
>>> We're getting occasional reports of instances not able to be
>>> snapshotted. Upon investigation, the snapshot process quits early with a
>>> libvirt/qemu lock timeout error. We then see that the instance's xml file
>>> has disappeared from /etc/libvirt/qemu and must restart libvirt and
>>> hard-reboot the instance to get things back to a normal state. Trying to
>>> live-migrate the instance to another node causes the same thing to happen.
>>>
>>> However, at some random time, either the snapshot or the migration will
>>> work without error. I haven't been able to reproduce this issue on my own
>>> and haven't been able to figure out the root cause by inspecting instances
>>> reported to me.
>>>
>>> One thing that has stood out is the length of time it takes for libvirt
>>> to start. If I run "/etc/init.d/libvirt-bin start", it takes at least 5
>>> minutes before a simple "virsh list" will work. The command will hang
>>> otherwise. If I increase libvirt's logging level, I can see that during
>>> this period of time, libvirt is working on iptables and ebtables (looks
>>> like it's shelling out commands).
>>>
>>> But if I run "libvirtd -l" straight on the command line, all of this
>>> completes within 5 seconds (including all of the shelling out).
>>>
>>> My initial thought is that systemd is doing some type of throttling
>>> between the system and user slice, but I've tried comparing slice
>>> attributes and, probably due to my lack of understanding of systemd, can't
>>> find anything to prove this.
>>>
>>> Is anyone else running into this problem? Does anyone know what might be
>>> the cause?
>>>
>>> Thanks,
>>> Joe
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20171123/5c0f5342/attachment.html>


More information about the OpenStack-operators mailing list