[openstack-dev] Your next semi weekly gate status report
Ian Wienand
iwienand at redhat.com
Wed Mar 29 07:31:30 UTC 2017
On 03/28/2017 08:57 AM, Clark Boylan wrote:
> 1. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911
> and http://status.openstack.org/elastic-recheck/#1646779
> Libvirt is randomly crashing during the job which causes things to fail
> (for obvious reasons). To address this will likely require someone with
> experience debugging libvirt since it's most likely a bug isolated to
> libvirt. We're looking for someone familiar with libvirt internals to
> drive the effort to fix this issue,
Ok, from the bug [1] we're seeing malloc() corruption.
While I agree that a coredump is not that likely to help, I would also
like to come to that conclusion after inspecting a coredump :) I've
found things in the heap before that give clues as to what real
problems are.
To this end, I've proposed [2] to keep coredumps. It's a little
hackish but I think gets the job done. [3] enables this and saves any
dumps to the logs in d-g.
As suggested, running under valgrind would be great but probably
impractical until we narrow it down a little. Another thing I've had
some success with is electric fence [4] which puts boundaries around
allocations so out-of-bounds access hits at the time of access. I've
proposed [5] to try this out, but it's not looking particularly
promising unfortunately. I'm open to suggestions, for example maybe
something like tcalloc might give us a different failure and could be
another clue. If we get something vaguely reliable here, our best bet
might be to run a parallel non-voting job on all changes to see what
we can pick up.
-i
[1] https://bugs.launchpad.net/nova/+bug/1643911
[2] https://review.openstack.org/451128
[3] https://review.openstack.org/451219
[4] http://elinux.org/Electric_Fence
[5] https://review.openstack.org/451136
More information about the OpenStack-dev
mailing list