<html><head><title></title></head><body><!-- rte-version 0.2 9947551637294008b77bce25eb683dac --><div class="rte-style-maintainer" style="white-space: pre-wrap; font-size: small; font-family: 'Courier New', Courier, 'BB.FixedWidth';"data-color="global-default" bbg-color="default" data-bb-font-size="medium" bbg-font-size="medium" bbg-font-family="fixed-width">I recently witnessed a strange issue with libvirt when upgrading one of our clusters from Kilo to Liberty. I'm not really looking for a specific diagnosis here because of the large number of confounding factors and the relative ease of remediating it, but I'm interested to hear if anyone else has witnessed this particular problem.<div><br></div><div>Background is we had a number of Kilo-based clusters, all running Ubuntu 14.04.4 with OpenStack installed from the Ubuntu cloud archive. The upgrade process to Liberty involved upgrading the OpenStack components and their dependencies (including libvirt), then afterward upgrading all remaining packages via dist-upgrade (and staging a kernel upgrade from 3.13 to 4.4, to take effect on the next reboot). 7 clusters had all been upgraded successfully using this strategy.</div><div><br></div><div>One cluster, however, decided to get a bit weird. After the upgrade, 4 hypervisors showed that nova-compute was refusing to come up properly and was showing as enabled/down in nova service-list. Upon further investigation, nova-compute was starting up but was getting jammed on loading nwfilters. When I ran "virsh nwfilter-list", the command stalled indefinitely. Killing nova-compute and restarting libvirt-bin service allowed the command to work again, but it did not list any of the nova-instance-instance-* nwfilters. Once nova-compute was started, it tried to start loading the instance-specific filters and libvirt would wedge. I spent a while tinkering with the affected systems but could not find any way of correcting the issue other than rebooting the hypervisor, after which everything was fine.</div><div><br></div><div>Has anyone ever seen anything like this? libvirt was upgraded from 1.2.12 to 1.2.16. Hundreds of hypervisors had already received this exact same upgrade without showing this problem, and I have no idea how I could reproduce it. I'm interested to hear if anyone else has ever run into this and if they figured out what the root cause was, though I've already braced myself for tumbleweeds.</div></div></body></html>