[openstack-dev] Your next semi weekly gate status report

Ian Wienand iwienand at redhat.com
Wed Mar 29 07:31:30 UTC 2017


On 03/28/2017 08:57 AM, Clark Boylan wrote:
> 1. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911
> and http://status.openstack.org/elastic-recheck/#1646779

> Libvirt is randomly crashing during the job which causes things to fail
> (for obvious reasons). To address this will likely require someone with
> experience debugging libvirt since it's most likely a bug isolated to
> libvirt. We're looking for someone familiar with libvirt internals to
> drive the effort to fix this issue,

Ok, from the bug [1] we're seeing malloc() corruption.

While I agree that a coredump is not that likely to help, I would also
like to come to that conclusion after inspecting a coredump :) I've
found things in the heap before that give clues as to what real
problems are.

To this end, I've proposed [2] to keep coredumps.  It's a little
hackish but I think gets the job done. [3] enables this and saves any
dumps to the logs in d-g.

As suggested, running under valgrind would be great but probably
impractical until we narrow it down a little.  Another thing I've had
some success with is electric fence [4] which puts boundaries around
allocations so out-of-bounds access hits at the time of access.  I've
proposed [5] to try this out, but it's not looking particularly
promising unfortunately.  I'm open to suggestions, for example maybe
something like tcalloc might give us a different failure and could be
another clue.  If we get something vaguely reliable here, our best bet
might be to run a parallel non-voting job on all changes to see what
we can pick up.

-i

[1] https://bugs.launchpad.net/nova/+bug/1643911
[2] https://review.openstack.org/451128
[3] https://review.openstack.org/451219
[4] http://elinux.org/Electric_Fence
[5] https://review.openstack.org/451136



More information about the OpenStack-dev mailing list