[openstack-dev] [gate] The gate: a failure analysis

Ihar Hrachyshka ihrachys at redhat.com
Mon Jul 28 14:27:23 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 28/07/14 16:22, Doug Hellmann wrote:
> 
> On Jul 28, 2014, at 2:52 AM, Angus Lees <gus at inodes.org> wrote:
> 
>> On Mon, 21 Jul 2014 04:39:49 PM David Kranz wrote:
>>> On 07/21/2014 04:13 PM, Jay Pipes wrote:
>>>> On 07/21/2014 02:03 PM, Clint Byrum wrote:
>>>>> Thanks Matthew for the analysis.
>>>>> 
>>>>> I think you missed something though.
>>>>> 
>>>>> Right now the frustration is that unrelated intermittent
>>>>> bugs stop your presumably good change from getting in.
>>>>> 
>>>>> Without gating, the result would be that even more bugs,
>>>>> many of them not intermittent at all, would get in. Right
>>>>> now, the one random developer who has to hunt down the
>>>>> rechecks and do them is inconvenienced. But without a gate,
>>>>> _every single_ developer will be inconvenienced until the
>>>>> fix is merged.
>>>>> 
>>>>> The false negative rate is _way_ too high. Nobody would
>>>>> disagree there. However, adding more false negatives and
>>>>> allowing more people to ignore the ones we already have,
>>>>> seems like it would have the opposite effect: Now instead
>>>>> of annoying the people who hit the random intermittent
>>>>> bugs, we'll be annoying _everybody_ as they hit the
>>>>> non-intermittent ones.
>>>> 
>>>> +10
>>> 
>>> Right, but perhaps there is a middle ground. We must not allow
>>> changes in that can't pass through the gate, but we can
>>> separate the problems of constant rechecks using too many
>>> resources, and of constant rechecks causing developer pain. If
>>> failures were deterministic we would skip the failing tests
>>> until they were fixed. Unfortunately many of the common 
>>> failures can blow up any test, or even the whole process.
>>> Following on what Sam said, what if we automatically reran jobs
>>> that failed in a known way, and disallowed "recheck/reverify no
>>> bug"? Developers would then have to track down what bug caused
>>> a failure or file a new one. But they would have to do so much
>>> less frequently, and as more common failures were catalogued it
>>> would become less and less frequent.
>>> 
>>> Some might (reasonably) argue that this would be a bad thing
>>> because it would reduce the incentive for people to fix bugs if
>>> there were less pain being inflicted. But given how hard it is
>>> to track down these race bugs, and that we as a community have
>>> no way to force time to be spent on them, and that it does not
>>> appear that these bugs are causing real systems to fall down
>>> (only our gating process), perhaps something different should
>>> be considered?
>> 
>> So to pick an example dear to my heart, I've been working on
>> removing these gate failures: 
>> http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkxvY2sgd2FpdCB0aW1lb3V0IGV4Y2VlZGVkOyB0cnkgcmVzdGFydGluZyB0cmFuc2FjdGlvblwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDA2NTI3OTA3NzkzfQ==
>>
>>
>> 
.. caused by a bad interaction between eventlet and our default choice of
>> mysql driver.  It would also affect any real world deployment
>> using mysql.
>> 
>> The problem has been identified and the fix proposed for almost a
>> month now, but actually fixing the gate jobs is still no-where in
>> sight.  The fix is (pretty much) as easy as a pip install and a
>> slightly modified database connection string. I look forward to a
>> discussion of the meta-issues surrounding this, but it is not
>> because no-one tracked down or fixed the bug :(
> 
> I believe the main blocking issue right now is that Oracle doesn’t
> upload that library to PyPI, and so our build-chain won’t be able
> to download it as it is currently configured. I think the last I
> saw someone was going to talk to Oracle about uploading the source.
> Have we heard back?

Yes, the guy who is in charge of the module said to me he's working on
publishing it on PyPI. I guess it's just a matter of more push from
our side, we'll be able to clean that up in timely manner.

/Ihar
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCgAGBQJT1l3LAAoJEC5aWaUY1u576uMIALC/ltVwr8hsukfzl4YV91uY
2/rU+brxJuS/pq6YUPURC49G7MGTjJ9fSpJn4HB7V8lZaTJ2+Ejm9gWIcr0w8oMn
UlTTvM+NEsi1tQXMZJVHfWjPNiMyquBihqlfBSJs9degHqb+c8kOMWB6wVZauA/m
nAZPRxfuoS1qOY8qljyvRbPE7Gf6yIiMZayh5mg3Lmp1tqDgk1IeB3Qc87NVp0Jx
Z7nxRlHA27caWI9nSC5FsFx58BHa1R7IMyQXMNUmxQVdy4Q5DABf7TZN4hy/XXC7
JrFsSwgHLJSyjkvWZLXW08y1Q3MZK9JN49y5ahgJGkmbiPyQnZ49AM4mBHwtqTU=
=lbt2
-----END PGP SIGNATURE-----



More information about the OpenStack-dev mailing list