[Openstack-operators] libvirt freezing when loading Nova instance nwfilters

Joe Topjian joe at topjian.net
Wed Feb 22 16:59:49 UTC 2017


We ran into the "virsh nwfilter-list hanging indefinitely" thing back in
early January. I spent hours and I almost went insane trying to figure it
out. We weren't upgrading nodes, though, it just sort of happened.

I have no idea if the following was the correct way of handling this, but
this ultimately got nova-compute back up and running:

I ran:

$ ss -ax

on the hypervisor and saw that some monitor sockets had a Recv-Q of
non-zero. On the processes related to those sockets, I ran:

$ strace -p <pid>

and saw no activity. Compared to sockets with zero Recv-Q, strace showed
activity. By now, I figured my only options were a full hypervisor reboot
or to kill the instances with no activity. Since those instances would be
killed from a full reboot anyway, I did a "virsh destroy" on the instances.
Once they were destroyed, nova-compute was able to start cleanly.

We had this happen on 3 hypervisors. Each one had between 1 and 3 of these
types of instances, so not a lot at all. Once they were destroyed,
nova-compute began working again on all 3.

We later had a user report that he noticed some problems with his instance
(not one of the ones destroyed) and thought it might have to do with the
leap second. No idea if that's true, but the timing kind of works out.

Hope that helps,
Joe


On Wed, Feb 22, 2017 at 8:33 AM, Edmund Rhudy (BLOOMBERG/ 120 PARK) <
erhudy at bloomberg.net> wrote:

> I recently witnessed a strange issue with libvirt when upgrading one of
> our clusters from Kilo to Liberty. I'm not really looking for a specific
> diagnosis here because of the large number of confounding factors and the
> relative ease of remediating it, but I'm interested to hear if anyone else
> has witnessed this particular problem.
>
> Background is we had a number of Kilo-based clusters, all running Ubuntu
> 14.04.4 with OpenStack installed from the Ubuntu cloud archive. The upgrade
> process to Liberty involved upgrading the OpenStack components and their
> dependencies (including libvirt), then afterward upgrading all remaining
> packages via dist-upgrade (and staging a kernel upgrade from 3.13 to 4.4,
> to take effect on the next reboot). 7 clusters had all been upgraded
> successfully using this strategy.
>
> One cluster, however, decided to get a bit weird. After the upgrade, 4
> hypervisors showed that nova-compute was refusing to come up properly and
> was showing as enabled/down in nova service-list. Upon further
> investigation, nova-compute was starting up but was getting jammed on
> loading nwfilters. When I ran "virsh nwfilter-list", the command stalled
> indefinitely. Killing nova-compute and restarting libvirt-bin service
> allowed the command to work again, but it did not list any of the
> nova-instance-instance-* nwfilters. Once nova-compute was started, it tried
> to start loading the instance-specific filters and libvirt would wedge. I
> spent a while tinkering with the affected systems but could not find any
> way of correcting the issue other than rebooting the hypervisor, after
> which everything was fine.
>
> Has anyone ever seen anything like this? libvirt was upgraded from 1.2.12
> to 1.2.16. Hundreds of hypervisors had already received this exact same
> upgrade without showing this problem, and I have no idea how I could
> reproduce it. I'm interested to hear if anyone else has ever run into this
> and if they figured out what the root cause was, though I've already braced
> myself for tumbleweeds.
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170222/5144642a/attachment.html>


More information about the OpenStack-operators mailing list