[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?
Sullivan, Jon Paul
JonPaul.Sullivan at hp.com
Thu Jul 3 10:22:04 UTC 2014
> -----Original Message-----
> From: Anita Kuno [mailto:anteaya at anteaya.info]
> Sent: 01 July 2014 14:42
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success"
> exactly?
>
> On 06/30/2014 09:13 PM, Jay Pipes wrote:
> > On 06/30/2014 07:08 PM, Anita Kuno wrote:
> >> On 06/30/2014 04:22 PM, Jay Pipes wrote:
> >>> Hi Stackers,
> >>>
> >>> Some recent ML threads [1] and a hot IRC meeting today [2] brought
> >>> up some legitimate questions around how a newly-proposed
> >>> Stackalytics report page for Neutron External CI systems [2]
> >>> represented the results of an external CI system as "successful" or
> not.
> >>>
> >>> First, I want to say that Ilya and all those involved in the
> >>> Stackalytics program simply want to provide the most accurate
> >>> information to developers in a format that is easily consumed. While
> >>> there need to be some changes in how data is shown (and the wording
> >>> of things like "Tests Succeeded"), I hope that the community knows
> >>> there isn't any ill intent on the part of Mirantis or anyone who
> >>> works on Stackalytics. OK, so let's keep the conversation civil --
> >>> we're all working towards the same goals of transparency and
> >>> accuracy. :)
> >>>
> >>> Alright, now, Anita and Kurt Taylor were asking a very poignant
> >>> question:
> >>>
> >>> "But what does CI tested really mean? just running tests? or tested
> >>> to pass some level of requirements?"
> >>>
> >>> In this nascent world of external CI systems, we have a set of
> >>> issues that we need to resolve:
> >>>
> >>> 1) All of the CI systems are different.
> >>>
> >>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
> >>> scripts. Others run custom Python code that spawns VMs and publishes
> >>> logs to some public domain.
> >>>
> >>> As a community, we need to decide whether it is worth putting in the
> >>> effort to create a single, unified, installable and runnable CI
> >>> system, so that we can legitimately say "all of the external systems
> >>> are identical, with the exception of the driver code for vendor X
> >>> being substituted in the Neutron codebase."
> >>>
> >>> If the goal of the external CI systems is to produce reliable,
> >>> consistent results, I feel the answer to the above is "yes", but I'm
> >>> interested to hear what others think. Frankly, in the world of
> >>> benchmarks, it would be unthinkable to say "go ahead and everyone
> >>> run your own benchmark suite", because you would get wildly
> >>> different results. A similar problem has emerged here.
> >>>
> >>> 2) There is no mediation or verification that the external CI system
> >>> is actually testing anything at all
> >>>
> >>> As a community, we need to decide whether the current system of
> >>> self-policing should continue. If it should, then language on
> >>> reports like [3] should be very clear that any numbers derived from
> >>> such systems should be taken with a grain of salt. Use of the word
> >>> "Success" should be avoided, as it has connotations (in English, at
> >>> least) that the result has been verified, which is simply not the
> >>> case as long as no verification or mediation occurs for any external
> CI system.
> >>>
> >>> 3) There is no clear indication of what tests are being run, and
> >>> therefore there is no clear indication of what "success" is
> >>>
> >>> I think we can all agree that a test has three possible outcomes:
> >>> pass, fail, and skip. The results of a test suite run therefore is
> >>> nothing more than the aggregation of which tests passed, which
> >>> failed, and which were skipped.
> >>>
> >>> As a community, we must document, for each project, what are
> >>> expected set of tests that must be run for each merged patch into
> >>> the project's source tree. This documentation should be discoverable
> >>> so that reports like [3] can be crystal-clear on what the data shown
> >>> actually means. The report is simply displaying the data it receives
> >>> from Gerrit. The community needs to be proactive in saying "this is
> >>> what is expected to be tested." This alone would allow the report to
> >>> give information such as "External CI system ABC performed the
> expected tests. X tests passed.
> >>> Y tests failed. Z tests were skipped." Likewise, it would also make
> >>> it possible for the report to give information such as "External CI
> >>> system DEF did not perform the expected tests.", which is excellent
> >>> information in and of itself.
> >>>
> >>> ===
> >>>
> >>> In thinking about the likely answers to the above questions, I
> >>> believe it would be prudent to change the Stackalytics report in
> >>> question [3] in the following ways:
> >>>
> >>> a. Change the "Success %" column header to "% Reported +1 Votes"
> >>> b. Change the phrase " Green cell - tests ran successfully, red cell
> >>> - tests failed" to "Green cell - System voted +1, red cell - System
> >>> voted -1"
> >>>
> >>> and then, when we have more and better data (for example, # tests
> >>> passed, failed, skipped, etc), we can provide more detailed
> >>> information than just "reported +1" or not.
> >>>
> >>> Thoughts?
> >>>
> >>> Best,
> >>> -jay
> >>>
> >>> [1]
> >>> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933.
> >>> html
> >>> [2]
> >>> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party
> >>> .2014-06-30-18.01.log.html
> >>>
> >>>
> >>> [3] http://stackalytics.com/report/ci/neutron/7
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> Hi Jay:
> >>
> >> Thanks for starting this thread. You raise some interesting
> questions.
> >>
> >> The question I had identified as needing definition is "what
> >> algorithm do we use to assess fitness of a third party ci system".
> >>
> >> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac
> >> k-infra.2014-06-30.log
> >>
> >> timestamp 2014-06-30T19:23:40
> >>
> >> This is the question that is top of mind for me.
> >
> > Right, my email above is written to say "unless there is a) uniformity
> > of the external CI system, b) agreement on mediation or verification
> > of said systems, and c) agreement on what tests shall be expected to
> > pass and be skipped for each project, then no such algorithm is really
> > possible."
> >
> > Now, if the community is willing to agree to a), b), and c), then
> > certainly there is the ability to determine the fitness of a CI system
> > -- at least in regards to its output (test results and the voting on
> > the Gerrit system).
> >
> > Barring agreement on any or all of those three things, I recommended
> > changing the language on the report due to the inability to have any
> > consistently-applied algorithm to determine fitness.
> >
> > Best,
> > -jay
> >
+1 to all of your points above, Jay. Well-written, thank you.
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> I've been mulling this over and looking at how I assess feedback I get
> from different human reviewers, since I don't know the basis of how they
> arrive at their decisions unless they tell me and/or I have experience
> with their criteria for how they review my patches.
>
> I get different value from different human reviewers based upon my
> experience of them reviewing my patches, my experience of them reviewing
> other people's patches, my experience reviewing their code and my
> discussions with them in channel, on the mailing list and in person, as
> well as my experience reading or becoming aware of other decisions they
> make.
>
> It would be really valuable for me personally to have a page in gerrit
> for each third party ci account, where I could sign in and leave
> comments or vote +/-1 or 0 as a way of giving feedback to the
> maintainers of that system. Also others could do the same and I could
> read their feedback. For instance, yesterday someone linked me to logs
> that forced me to download them to read. I hadn't been made aware this
> account had been doing this, but this developer was aware. Currently we
> have no system for a developer, in the course of their normal workflow,
> to leave a comment and/or vote on a third party ci system to give those
> maintainers feedback about how they are doing at providing consumable
> artifacts from their system.
>
> It also would remove the perception that I'm just a big meany, since
> developers could comment for themselves, directly on the account, how
> they feel about having to download tarballs, or sign into other systems
> to trigger a recheck. The community of developers would say how fit a
> system is or isn't since they are the individuals having to dig through
> logs and evaluate "did this build fail because the code needs
> adjustment" or not, and can reflect their findings in a comment and vote
> on the system.
>
> The other thing I really value about gerrit is that votes can change,
> systems can improve, given motivation and accurate feedback for making
> changes.
>
> I have no idea how hard this would be to create, but I think having
> direct feedback from developers on systems would help both the
> developers and the maintainers of ci systems.
>
> There are a number of people working really hard to do a good job in
> this area. This sort of structure would also provide support and
> encouragement to those people providing leadership in this space, people
> asking good questions, helping other system maintainers, starting
> discussions, offering patches to infra (and reviewing infra patches) in
> accordance with the goals of the third party meeting[0] and other hard-
> to-measure valuable decisions that provide value for the community.
> I'd really like a way we all can demonstrate the extent to which we
> value these contributions.
>
> So far, those are my thoughts.
>
> Thanks,
> Anita.
+1 - this sounds like a really good idea.
How is feedback on the Openstack check/gate retrieved and moderated? Can that provide a model for doing what you suggest here?
>
>
> [0]
> https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part
> y_meetings
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list