[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Anita Kuno anteaya at anteaya.info
Mon Jun 30 23:08:07 UTC 2014


On 06/30/2014 04:22 PM, Jay Pipes wrote:
> Hi Stackers,
> 
> Some recent ML threads [1] and a hot IRC meeting today [2] brought up
> some legitimate questions around how a newly-proposed Stackalytics
> report page for Neutron External CI systems [2] represented the results
> of an external CI system as "successful" or not.
> 
> First, I want to say that Ilya and all those involved in the
> Stackalytics program simply want to provide the most accurate
> information to developers in a format that is easily consumed. While
> there need to be some changes in how data is shown (and the wording of
> things like "Tests Succeeded"), I hope that the community knows there
> isn't any ill intent on the part of Mirantis or anyone who works on
> Stackalytics. OK, so let's keep the conversation civil -- we're all
> working towards the same goals of transparency and accuracy. :)
> 
> Alright, now, Anita and Kurt Taylor were asking a very poignant question:
> 
> "But what does CI tested really mean? just running tests? or tested to
> pass some level of requirements?"
> 
> In this nascent world of external CI systems, we have a set of issues
> that we need to resolve:
> 
> 1) All of the CI systems are different.
> 
> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
> scripts. Others run custom Python code that spawns VMs and publishes
> logs to some public domain.
> 
> As a community, we need to decide whether it is worth putting in the
> effort to create a single, unified, installable and runnable CI system,
> so that we can legitimately say "all of the external systems are
> identical, with the exception of the driver code for vendor X being
> substituted in the Neutron codebase."
> 
> If the goal of the external CI systems is to produce reliable,
> consistent results, I feel the answer to the above is "yes", but I'm
> interested to hear what others think. Frankly, in the world of
> benchmarks, it would be unthinkable to say "go ahead and everyone run
> your own benchmark suite", because you would get wildly different
> results. A similar problem has emerged here.
> 
> 2) There is no mediation or verification that the external CI system is
> actually testing anything at all
> 
> As a community, we need to decide whether the current system of
> self-policing should continue. If it should, then language on reports
> like [3] should be very clear that any numbers derived from such systems
> should be taken with a grain of salt. Use of the word "Success" should
> be avoided, as it has connotations (in English, at least) that the
> result has been verified, which is simply not the case as long as no
> verification or mediation occurs for any external CI system.
> 
> 3) There is no clear indication of what tests are being run, and
> therefore there is no clear indication of what "success" is
> 
> I think we can all agree that a test has three possible outcomes: pass,
> fail, and skip. The results of a test suite run therefore is nothing
> more than the aggregation of which tests passed, which failed, and which
> were skipped.
> 
> As a community, we must document, for each project, what are expected
> set of tests that must be run for each merged patch into the project's
> source tree. This documentation should be discoverable so that reports
> like [3] can be crystal-clear on what the data shown actually means. The
> report is simply displaying the data it receives from Gerrit. The
> community needs to be proactive in saying "this is what is expected to
> be tested." This alone would allow the report to give information such
> as "External CI system ABC performed the expected tests. X tests passed.
> Y tests failed. Z tests were skipped." Likewise, it would also make it
> possible for the report to give information such as "External CI system
> DEF did not perform the expected tests.", which is excellent information
> in and of itself.
> 
> ===
> 
> In thinking about the likely answers to the above questions, I believe
> it would be prudent to change the Stackalytics report in question [3] in
> the following ways:
> 
> a. Change the "Success %" column header to "% Reported +1 Votes"
> b. Change the phrase " Green cell - tests ran successfully, red cell -
> tests failed" to "Green cell - System voted +1, red cell - System voted -1"
> 
> and then, when we have more and better data (for example, # tests
> passed, failed, skipped, etc), we can provide more detailed information
> than just "reported +1" or not.
> 
> Thoughts?
> 
> Best,
> -jay
> 
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933.html
> [2]
> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party.2014-06-30-18.01.log.html
> 
> [3] http://stackalytics.com/report/ci/neutron/7
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Hi Jay:

Thanks for starting this thread. You raise some interesting questions.

The question I had identified as needing definition is "what algorithm
do we use to assess fitness of a third party ci system".

http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2014-06-30.log
timestamp 2014-06-30T19:23:40

This is the question that is top of mind for me.

Thanks Jay,
Anita.



More information about the OpenStack-dev mailing list