[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Franck Yelles franck110 at gmail.com
Mon Jun 30 21:05:54 UTC 2014

Hi Jay,

Couple of points.

I support the fact that we need to define what is "success" is.
I believe that the metrics that should be used are "Voted +1" and
But to certain valid case, I would say that the  "Voted -1" is really
mostly a metric of bad health of a CI.
Most of the -1 are due to environment issue, configuration problem, etc...
In my case, the -1 are done manually since I want to avoid giving some
extra work to the developer.

That are some possible solutions ?

On the Jenkins, I think we could develop a script that will parse the
result html file.
Jenkins will then vote (+1, 0, -1) on the behalf of the 3rd party CI.
- It would prevent the abusive +1
- If the result HTML is empty, it would indicate the CI health is bad
- if all the result are failing, it would also indicate that CI health is




On Mon, Jun 30, 2014 at 1:22 PM, Jay Pipes <jaypipes at gmail.com> wrote:

> Hi Stackers,
> Some recent ML threads [1] and a hot IRC meeting today [2] brought up some
> legitimate questions around how a newly-proposed Stackalytics report page
> for Neutron External CI systems [2] represented the results of an external
> CI system as "successful" or not.
> First, I want to say that Ilya and all those involved in the Stackalytics
> program simply want to provide the most accurate information to developers
> in a format that is easily consumed. While there need to be some changes in
> how data is shown (and the wording of things like "Tests Succeeded"), I
> hope that the community knows there isn't any ill intent on the part of
> Mirantis or anyone who works on Stackalytics. OK, so let's keep the
> conversation civil -- we're all working towards the same goals of
> transparency and accuracy. :)
> Alright, now, Anita and Kurt Taylor were asking a very poignant question:
> "But what does CI tested really mean? just running tests? or tested to
> pass some level of requirements?"
> In this nascent world of external CI systems, we have a set of issues that
> we need to resolve:
> 1) All of the CI systems are different.
> Some run Bash scripts. Some run Jenkins slaves and devstack-gate scripts.
> Others run custom Python code that spawns VMs and publishes logs to some
> public domain.
> As a community, we need to decide whether it is worth putting in the
> effort to create a single, unified, installable and runnable CI system, so
> that we can legitimately say "all of the external systems are identical,
> with the exception of the driver code for vendor X being substituted in the
> Neutron codebase."
> If the goal of the external CI systems is to produce reliable, consistent
> results, I feel the answer to the above is "yes", but I'm interested to
> hear what others think. Frankly, in the world of benchmarks, it would be
> unthinkable to say "go ahead and everyone run your own benchmark suite",
> because you would get wildly different results. A similar problem has
> emerged here.
> 2) There is no mediation or verification that the external CI system is
> actually testing anything at all
> As a community, we need to decide whether the current system of
> self-policing should continue. If it should, then language on reports like
> [3] should be very clear that any numbers derived from such systems should
> be taken with a grain of salt. Use of the word "Success" should be avoided,
> as it has connotations (in English, at least) that the result has been
> verified, which is simply not the case as long as no verification or
> mediation occurs for any external CI system.
> 3) There is no clear indication of what tests are being run, and therefore
> there is no clear indication of what "success" is
> I think we can all agree that a test has three possible outcomes: pass,
> fail, and skip. The results of a test suite run therefore is nothing more
> than the aggregation of which tests passed, which failed, and which were
> skipped.
> As a community, we must document, for each project, what are expected set
> of tests that must be run for each merged patch into the project's source
> tree. This documentation should be discoverable so that reports like [3]
> can be crystal-clear on what the data shown actually means. The report is
> simply displaying the data it receives from Gerrit. The community needs to
> be proactive in saying "this is what is expected to be tested." This alone
> would allow the report to give information such as "External CI system ABC
> performed the expected tests. X tests passed. Y tests failed. Z tests were
> skipped." Likewise, it would also make it possible for the report to give
> information such as "External CI system DEF did not perform the expected
> tests.", which is excellent information in and of itself.
> ===
> In thinking about the likely answers to the above questions, I believe it
> would be prudent to change the Stackalytics report in question [3] in the
> following ways:
> a. Change the "Success %" column header to "% Reported +1 Votes"
> b. Change the phrase " Green cell - tests ran successfully, red cell -
> tests failed" to "Green cell - System voted +1, red cell - System voted -1"
> and then, when we have more and better data (for example, # tests passed,
> failed, skipped, etc), we can provide more detailed information than just
> "reported +1" or not.
> Thoughts?
> Best,
> -jay
> [1] http://lists.openstack.org/pipermail/openstack-dev/2014-
> June/038933.html
> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/
> third_party.2014-06-30-18.01.log.html
> [3] http://stackalytics.com/report/ci/neutron/7
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140630/27bf9db8/attachment.html>

More information about the OpenStack-dev mailing list