[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Anita Kuno anteaya at anteaya.info
Thu Jul 3 12:53:17 UTC 2014


On 07/03/2014 06:22 AM, Sullivan, Jon Paul wrote:
>> -----Original Message-----
>> From: Anita Kuno [mailto:anteaya at anteaya.info]
>> Sent: 01 July 2014 14:42
>> To: openstack-dev at lists.openstack.org
>> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success"
>> exactly?
>>
>> On 06/30/2014 09:13 PM, Jay Pipes wrote:
>>> On 06/30/2014 07:08 PM, Anita Kuno wrote:
>>>> On 06/30/2014 04:22 PM, Jay Pipes wrote:
>>>>> Hi Stackers,
>>>>>
>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought
>>>>> up some legitimate questions around how a newly-proposed
>>>>> Stackalytics report page for Neutron External CI systems [2]
>>>>> represented the results of an external CI system as "successful" or
>> not.
>>>>>
>>>>> First, I want to say that Ilya and all those involved in the
>>>>> Stackalytics program simply want to provide the most accurate
>>>>> information to developers in a format that is easily consumed. While
>>>>> there need to be some changes in how data is shown (and the wording
>>>>> of things like "Tests Succeeded"), I hope that the community knows
>>>>> there isn't any ill intent on the part of Mirantis or anyone who
>>>>> works on Stackalytics. OK, so let's keep the conversation civil --
>>>>> we're all working towards the same goals of transparency and
>>>>> accuracy. :)
>>>>>
>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant
>>>>> question:
>>>>>
>>>>> "But what does CI tested really mean? just running tests? or tested
>>>>> to pass some level of requirements?"
>>>>>
>>>>> In this nascent world of external CI systems, we have a set of
>>>>> issues that we need to resolve:
>>>>>
>>>>> 1) All of the CI systems are different.
>>>>>
>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
>>>>> scripts. Others run custom Python code that spawns VMs and publishes
>>>>> logs to some public domain.
>>>>>
>>>>> As a community, we need to decide whether it is worth putting in the
>>>>> effort to create a single, unified, installable and runnable CI
>>>>> system, so that we can legitimately say "all of the external systems
>>>>> are identical, with the exception of the driver code for vendor X
>>>>> being substituted in the Neutron codebase."
>>>>>
>>>>> If the goal of the external CI systems is to produce reliable,
>>>>> consistent results, I feel the answer to the above is "yes", but I'm
>>>>> interested to hear what others think. Frankly, in the world of
>>>>> benchmarks, it would be unthinkable to say "go ahead and everyone
>>>>> run your own benchmark suite", because you would get wildly
>>>>> different results. A similar problem has emerged here.
>>>>>
>>>>> 2) There is no mediation or verification that the external CI system
>>>>> is actually testing anything at all
>>>>>
>>>>> As a community, we need to decide whether the current system of
>>>>> self-policing should continue. If it should, then language on
>>>>> reports like [3] should be very clear that any numbers derived from
>>>>> such systems should be taken with a grain of salt. Use of the word
>>>>> "Success" should be avoided, as it has connotations (in English, at
>>>>> least) that the result has been verified, which is simply not the
>>>>> case as long as no verification or mediation occurs for any external
>> CI system.
>>>>>
>>>>> 3) There is no clear indication of what tests are being run, and
>>>>> therefore there is no clear indication of what "success" is
>>>>>
>>>>> I think we can all agree that a test has three possible outcomes:
>>>>> pass, fail, and skip. The results of a test suite run therefore is
>>>>> nothing more than the aggregation of which tests passed, which
>>>>> failed, and which were skipped.
>>>>>
>>>>> As a community, we must document, for each project, what are
>>>>> expected set of tests that must be run for each merged patch into
>>>>> the project's source tree. This documentation should be discoverable
>>>>> so that reports like [3] can be crystal-clear on what the data shown
>>>>> actually means. The report is simply displaying the data it receives
>>>>> from Gerrit. The community needs to be proactive in saying "this is
>>>>> what is expected to be tested." This alone would allow the report to
>>>>> give information such as "External CI system ABC performed the
>> expected tests. X tests passed.
>>>>> Y tests failed. Z tests were skipped." Likewise, it would also make
>>>>> it possible for the report to give information such as "External CI
>>>>> system DEF did not perform the expected tests.", which is excellent
>>>>> information in and of itself.
>>>>>
>>>>> ===
>>>>>
>>>>> In thinking about the likely answers to the above questions, I
>>>>> believe it would be prudent to change the Stackalytics report in
>>>>> question [3] in the following ways:
>>>>>
>>>>> a. Change the "Success %" column header to "% Reported +1 Votes"
>>>>> b. Change the phrase " Green cell - tests ran successfully, red cell
>>>>> - tests failed" to "Green cell - System voted +1, red cell - System
>>>>> voted -1"
>>>>>
>>>>> and then, when we have more and better data (for example, # tests
>>>>> passed, failed, skipped, etc), we can provide more detailed
>>>>> information than just "reported +1" or not.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Best,
>>>>> -jay
>>>>>
>>>>> [1]
>>>>> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933.
>>>>> html
>>>>> [2]
>>>>> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party
>>>>> .2014-06-30-18.01.log.html
>>>>>
>>>>>
>>>>> [3] http://stackalytics.com/report/ci/neutron/7
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> Hi Jay:
>>>>
>>>> Thanks for starting this thread. You raise some interesting
>> questions.
>>>>
>>>> The question I had identified as needing definition is "what
>>>> algorithm do we use to assess fitness of a third party ci system".
>>>>
>>>> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac
>>>> k-infra.2014-06-30.log
>>>>
>>>> timestamp 2014-06-30T19:23:40
>>>>
>>>> This is the question that is top of mind for me.
>>>
>>> Right, my email above is written to say "unless there is a) uniformity
>>> of the external CI system, b) agreement on mediation or verification
>>> of said systems, and c) agreement on what tests shall be expected to
>>> pass and be skipped for each project, then no such algorithm is really
>>> possible."
>>>
>>> Now, if the community is willing to agree to a), b), and c), then
>>> certainly there is the ability to determine the fitness of a CI system
>>> -- at least in regards to its output (test results and the voting on
>>> the Gerrit system).
>>>
>>> Barring agreement on any or all of those three things, I recommended
>>> changing the language on the report due to the inability to have any
>>> consistently-applied algorithm to determine fitness.
>>>
>>> Best,
>>> -jay
>>>
> 
> +1 to all of your points above, Jay.  Well-written, thank you.
> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> I've been mulling this over and looking at how I assess feedback I get
>> from different human reviewers, since I don't know the basis of how they
>> arrive at their decisions unless they tell me and/or I have experience
>> with their criteria for how they review my patches.
>>
>> I get different value from different human reviewers based upon my
>> experience of them reviewing my patches, my experience of them reviewing
>> other people's patches, my experience reviewing their code and my
>> discussions with them in channel, on the mailing list and in person, as
>> well as my experience reading or becoming aware of other decisions they
>> make.
>>
>> It would be really valuable for me personally to have a page in gerrit
>> for each third party ci account, where I could sign in and leave
>> comments or vote +/-1 or 0 as a way of giving feedback to the
>> maintainers of that system. Also others could do the same and I could
>> read their feedback. For instance, yesterday someone linked me to logs
>> that forced me to download them to read. I hadn't been made aware this
>> account had been doing this, but this developer was aware. Currently we
>> have no system for a developer, in the course of their normal workflow,
>> to leave a comment and/or vote on a third party ci system to give those
>> maintainers feedback about how they are doing at providing consumable
>> artifacts from their system.
>>
>> It also would remove the perception that I'm just a big meany, since
>> developers could comment for themselves, directly on the account, how
>> they feel about having to download tarballs, or sign into other systems
>> to trigger a recheck. The community of developers would say how fit a
>> system is or isn't since they are the individuals having to dig through
>> logs and evaluate "did this build fail because the code needs
>> adjustment" or not, and can reflect their findings in a comment and vote
>> on the system.
>>
>> The other thing I really value about gerrit is that votes can change,
>> systems can improve, given motivation and accurate feedback for making
>> changes.
>>
>> I have no idea how hard this would be to create, but I think having
>> direct feedback from developers on systems would help both the
>> developers and the maintainers of ci systems.
>>
>> There are a number of people working really hard to do a good job in
>> this area. This sort of structure would also provide support and
>> encouragement to those people providing leadership in this space, people
>> asking good questions, helping other system maintainers, starting
>> discussions, offering patches to infra (and reviewing infra patches) in
>> accordance with the goals of the third party meeting[0] and other hard-
>> to-measure valuable decisions that provide value for the community.
>> I'd really like a way we all can demonstrate the extent to which we
>> value these contributions.
>>
>> So far, those are my thoughts.
>>
>> Thanks,
>> Anita.
> 
> +1 - this sounds like a really good idea.
> 
> How is feedback on the Openstack check/gate retrieved and moderated?  Can that provide a model for doing what you suggest here?
Hi Jon Paul: (Is it Jon Paul or Jon?)

The OpenStack check/gate pipelines are assessed using a system we call
elastic recheck: http://status.openstack.org/elastic-recheck/

We use logstash to index log output and elastic search to then be able
to compose queries to evaluate the number of incidence of a given error
message (for example). Sample query:
http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries/1097592.yaml

The elastic-recheck repo is here:
http://git.openstack.org/cgit/openstack-infra/elastic-recheck/

The gui is available at logstash.openstack.org.

All the queries are written manually as yaml files and named with a
corresponding bug number:
http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries

Here is some documentation about elastic-recheck and how to write
queries: http://docs.openstack.org/infra/elastic-recheck/readme.html

Joe Gordon actually has created some great graphs (where are those
hosted again, Joe?) to be able to evaluate failure rates in the
pipelines (check and gate) based on test groups (tempest, unit tests).

Clark Boylan and Sean Dague did and do the majority of the heavy lifting
setting up and maintaining elastic-recheck (with lots of help from
others, thank you!) so perhaps they could offer their opinion on if this
is a reasonable choice for evaluating third party ci systems.

Thanks Jon Paul, this is a good question,
Anita.
> 
>>
>>
>> [0]
>> https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part
>> y_meetings
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list