[openstack-dev] [nova] top gate bug is libvirt snapshot

Sean Dague sean at dague.net
Wed Jul 9 12:34:06 UTC 2014


On 07/09/2014 03:58 AM, Daniel P. Berrange wrote:
> On Tue, Jul 08, 2014 at 02:50:40PM -0700, Joe Gordon wrote:
>>>> But for right now, we should stop the bleeding, so that nova/libvirt
>>>> isn't blocking everyone else from merging code.
>>>
>>> Agreed, we should merge the hack and treat the bug as release blocker
>>> to be resolve prior to Juno GA.
>>>
>>
>>
>> How can we prevent libvirt issues like this from landing in trunk in the
>> first place? If we don't figure out a way to prevent this from landing the
>> first place I fear we will keep repeating this same pattern of failure.

Right, this is where math is against us. If a race shows up 1% of the
time, you need 66 runs to have a 50% of seeing it. I still haven't
calibrated the bugs to an absolute scale, but I think based on what I
remember this livesnapshot bug was probably a 3-4% bug (per Tempest
run). So you'd need 50 Tempest runs to have an 80% to see it show up again.

(Absolute calibration of the bugs is on my todo list for Elastic
Recheck, maybe it's time to put that in front of fixing the bugs)

> Realistically I don't think there was much/any chance of avoiding this
> problem. Despite many days of work trying to reproduce it by multiple
> people, no one has managed even 1 single failure outside of the gate.
> Even inside the gate it is hard to reproduce. I still have absolutely
> no clue what is failing after days of investigation & debugging with
> all the tricks I can think of, because as I say, it works perfectly
> every time I try it, except in the gate where it is impossible to
> debug it.

Out of curiosity, is your reproduce using eventlet? My expectation is
that eventlet's concurency actually exacerbates this because when the
snapshot starts we're now doing IO, and that means it's exactly the time
that other compute work will be triggered.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140709/ae60d820/attachment.pgp>


More information about the OpenStack-dev mailing list