<p dir="ltr"><br>
On Jul 3, 2014 8:57 AM, "Anita Kuno" <<a href="mailto:anteaya@anteaya.info">anteaya@anteaya.info</a>> wrote:<br>
><br>
> On 07/03/2014 06:22 AM, Sullivan, Jon Paul wrote:<br>
> >> -----Original Message-----<br>
> >> From: Anita Kuno [mailto:<a href="mailto:anteaya@anteaya.info">anteaya@anteaya.info</a>]<br>
> >> Sent: 01 July 2014 14:42<br>
> >> To: <a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a><br>
> >> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success"<br>
> >> exactly?<br>
> >><br>
> >> On 06/30/2014 09:13 PM, Jay Pipes wrote:<br>
> >>> On 06/30/2014 07:08 PM, Anita Kuno wrote:<br>
> >>>> On 06/30/2014 04:22 PM, Jay Pipes wrote:<br>
> >>>>> Hi Stackers,<br>
> >>>>><br>
> >>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought<br>
> >>>>> up some legitimate questions around how a newly-proposed<br>
> >>>>> Stackalytics report page for Neutron External CI systems [2]<br>
> >>>>> represented the results of an external CI system as "successful" or<br>
> >> not.<br>
> >>>>><br>
> >>>>> First, I want to say that Ilya and all those involved in the<br>
> >>>>> Stackalytics program simply want to provide the most accurate<br>
> >>>>> information to developers in a format that is easily consumed. While<br>
> >>>>> there need to be some changes in how data is shown (and the wording<br>
> >>>>> of things like "Tests Succeeded"), I hope that the community knows<br>
> >>>>> there isn't any ill intent on the part of Mirantis or anyone who<br>
> >>>>> works on Stackalytics. OK, so let's keep the conversation civil --<br>
> >>>>> we're all working towards the same goals of transparency and<br>
> >>>>> accuracy. :)<br>
> >>>>><br>
> >>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant<br>
> >>>>> question:<br>
> >>>>><br>
> >>>>> "But what does CI tested really mean? just running tests? or tested<br>
> >>>>> to pass some level of requirements?"<br>
> >>>>><br>
> >>>>> In this nascent world of external CI systems, we have a set of<br>
> >>>>> issues that we need to resolve:<br>
> >>>>><br>
> >>>>> 1) All of the CI systems are different.<br>
> >>>>><br>
> >>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate<br>
> >>>>> scripts. Others run custom Python code that spawns VMs and publishes<br>
> >>>>> logs to some public domain.<br>
> >>>>><br>
> >>>>> As a community, we need to decide whether it is worth putting in the<br>
> >>>>> effort to create a single, unified, installable and runnable CI<br>
> >>>>> system, so that we can legitimately say "all of the external systems<br>
> >>>>> are identical, with the exception of the driver code for vendor X<br>
> >>>>> being substituted in the Neutron codebase."<br>
> >>>>><br>
> >>>>> If the goal of the external CI systems is to produce reliable,<br>
> >>>>> consistent results, I feel the answer to the above is "yes", but I'm<br>
> >>>>> interested to hear what others think. Frankly, in the world of<br>
> >>>>> benchmarks, it would be unthinkable to say "go ahead and everyone<br>
> >>>>> run your own benchmark suite", because you would get wildly<br>
> >>>>> different results. A similar problem has emerged here.<br>
> >>>>><br>
> >>>>> 2) There is no mediation or verification that the external CI system<br>
> >>>>> is actually testing anything at all<br>
> >>>>><br>
> >>>>> As a community, we need to decide whether the current system of<br>
> >>>>> self-policing should continue. If it should, then language on<br>
> >>>>> reports like [3] should be very clear that any numbers derived from<br>
> >>>>> such systems should be taken with a grain of salt. Use of the word<br>
> >>>>> "Success" should be avoided, as it has connotations (in English, at<br>
> >>>>> least) that the result has been verified, which is simply not the<br>
> >>>>> case as long as no verification or mediation occurs for any external<br>
> >> CI system.<br>
> >>>>><br>
> >>>>> 3) There is no clear indication of what tests are being run, and<br>
> >>>>> therefore there is no clear indication of what "success" is<br>
> >>>>><br>
> >>>>> I think we can all agree that a test has three possible outcomes:<br>
> >>>>> pass, fail, and skip. The results of a test suite run therefore is<br>
> >>>>> nothing more than the aggregation of which tests passed, which<br>
> >>>>> failed, and which were skipped.<br>
> >>>>><br>
> >>>>> As a community, we must document, for each project, what are<br>
> >>>>> expected set of tests that must be run for each merged patch into<br>
> >>>>> the project's source tree. This documentation should be discoverable<br>
> >>>>> so that reports like [3] can be crystal-clear on what the data shown<br>
> >>>>> actually means. The report is simply displaying the data it receives<br>
> >>>>> from Gerrit. The community needs to be proactive in saying "this is<br>
> >>>>> what is expected to be tested." This alone would allow the report to<br>
> >>>>> give information such as "External CI system ABC performed the<br>
> >> expected tests. X tests passed.<br>
> >>>>> Y tests failed. Z tests were skipped." Likewise, it would also make<br>
> >>>>> it possible for the report to give information such as "External CI<br>
> >>>>> system DEF did not perform the expected tests.", which is excellent<br>
> >>>>> information in and of itself.<br>
> >>>>><br>
> >>>>> ===<br>
> >>>>><br>
> >>>>> In thinking about the likely answers to the above questions, I<br>
> >>>>> believe it would be prudent to change the Stackalytics report in<br>
> >>>>> question [3] in the following ways:<br>
> >>>>><br>
> >>>>> a. Change the "Success %" column header to "% Reported +1 Votes"<br>
> >>>>> b. Change the phrase " Green cell - tests ran successfully, red cell<br>
> >>>>> - tests failed" to "Green cell - System voted +1, red cell - System<br>
> >>>>> voted -1"<br>
> >>>>><br>
> >>>>> and then, when we have more and better data (for example, # tests<br>
> >>>>> passed, failed, skipped, etc), we can provide more detailed<br>
> >>>>> information than just "reported +1" or not.<br>
> >>>>><br>
> >>>>> Thoughts?<br>
> >>>>><br>
> >>>>> Best,<br>
> >>>>> -jay<br>
> >>>>><br>
> >>>>> [1]<br>
> >>>>> <a href="http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933">http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933</a>.<br>
> >>>>> html<br>
> >>>>> [2]<br>
> >>>>> <a href="http://eavesdrop.openstack.org/meetings/third_party/2014/third_party">http://eavesdrop.openstack.org/meetings/third_party/2014/third_party</a><br>
> >>>>> .2014-06-30-18.01.log.html<br>
> >>>>><br>
> >>>>><br>
> >>>>> [3] <a href="http://stackalytics.com/report/ci/neutron/7">http://stackalytics.com/report/ci/neutron/7</a><br>
> >>>>><br>
> >>>>> _______________________________________________<br>
> >>>>> OpenStack-dev mailing list<br>
> >>>>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> >>>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >>>> Hi Jay:<br>
> >>>><br>
> >>>> Thanks for starting this thread. You raise some interesting<br>
> >> questions.<br>
> >>>><br>
> >>>> The question I had identified as needing definition is "what<br>
> >>>> algorithm do we use to assess fitness of a third party ci system".<br>
> >>>><br>
> >>>> <a href="http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac">http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac</a><br>
> >>>> k-infra.2014-06-30.log<br>
> >>>><br>
> >>>> timestamp 2014-06-30T19:23:40<br>
> >>>><br>
> >>>> This is the question that is top of mind for me.<br>
> >>><br>
> >>> Right, my email above is written to say "unless there is a) uniformity<br>
> >>> of the external CI system, b) agreement on mediation or verification<br>
> >>> of said systems, and c) agreement on what tests shall be expected to<br>
> >>> pass and be skipped for each project, then no such algorithm is really<br>
> >>> possible."<br>
> >>><br>
> >>> Now, if the community is willing to agree to a), b), and c), then<br>
> >>> certainly there is the ability to determine the fitness of a CI system<br>
> >>> -- at least in regards to its output (test results and the voting on<br>
> >>> the Gerrit system).<br>
> >>><br>
> >>> Barring agreement on any or all of those three things, I recommended<br>
> >>> changing the language on the report due to the inability to have any<br>
> >>> consistently-applied algorithm to determine fitness.<br>
> >>><br>
> >>> Best,<br>
> >>> -jay<br>
> >>><br>
> ><br>
> > +1 to all of your points above, Jay. Well-written, thank you.<br>
> ><br>
> >>> _______________________________________________<br>
> >>> OpenStack-dev mailing list<br>
> >>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> >>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >> I've been mulling this over and looking at how I assess feedback I get<br>
> >> from different human reviewers, since I don't know the basis of how they<br>
> >> arrive at their decisions unless they tell me and/or I have experience<br>
> >> with their criteria for how they review my patches.<br>
> >><br>
> >> I get different value from different human reviewers based upon my<br>
> >> experience of them reviewing my patches, my experience of them reviewing<br>
> >> other people's patches, my experience reviewing their code and my<br>
> >> discussions with them in channel, on the mailing list and in person, as<br>
> >> well as my experience reading or becoming aware of other decisions they<br>
> >> make.<br>
> >><br>
> >> It would be really valuable for me personally to have a page in gerrit<br>
> >> for each third party ci account, where I could sign in and leave<br>
> >> comments or vote +/-1 or 0 as a way of giving feedback to the<br>
> >> maintainers of that system. Also others could do the same and I could<br>
> >> read their feedback. For instance, yesterday someone linked me to logs<br>
> >> that forced me to download them to read. I hadn't been made aware this<br>
> >> account had been doing this, but this developer was aware. Currently we<br>
> >> have no system for a developer, in the course of their normal workflow,<br>
> >> to leave a comment and/or vote on a third party ci system to give those<br>
> >> maintainers feedback about how they are doing at providing consumable<br>
> >> artifacts from their system.<br>
> >><br>
> >> It also would remove the perception that I'm just a big meany, since<br>
> >> developers could comment for themselves, directly on the account, how<br>
> >> they feel about having to download tarballs, or sign into other systems<br>
> >> to trigger a recheck. The community of developers would say how fit a<br>
> >> system is or isn't since they are the individuals having to dig through<br>
> >> logs and evaluate "did this build fail because the code needs<br>
> >> adjustment" or not, and can reflect their findings in a comment and vote<br>
> >> on the system.<br>
> >><br>
> >> The other thing I really value about gerrit is that votes can change,<br>
> >> systems can improve, given motivation and accurate feedback for making<br>
> >> changes.<br>
> >><br>
> >> I have no idea how hard this would be to create, but I think having<br>
> >> direct feedback from developers on systems would help both the<br>
> >> developers and the maintainers of ci systems.<br>
> >><br>
> >> There are a number of people working really hard to do a good job in<br>
> >> this area. This sort of structure would also provide support and<br>
> >> encouragement to those people providing leadership in this space, people<br>
> >> asking good questions, helping other system maintainers, starting<br>
> >> discussions, offering patches to infra (and reviewing infra patches) in<br>
> >> accordance with the goals of the third party meeting[0] and other hard-<br>
> >> to-measure valuable decisions that provide value for the community.<br>
> >> I'd really like a way we all can demonstrate the extent to which we<br>
> >> value these contributions.<br>
> >><br>
> >> So far, those are my thoughts.<br>
> >><br>
> >> Thanks,<br>
> >> Anita.<br>
> ><br>
> > +1 - this sounds like a really good idea.<br>
> ><br>
> > How is feedback on the Openstack check/gate retrieved and moderated? Can that provide a model for doing what you suggest here?<br>
> Hi Jon Paul: (Is it Jon Paul or Jon?)<br>
><br>
> The OpenStack check/gate pipelines are assessed using a system we call<br>
> elastic recheck: <a href="http://status.openstack.org/elastic-recheck/">http://status.openstack.org/elastic-recheck/</a><br>
><br>
> We use logstash to index log output and elastic search to then be able<br>
> to compose queries to evaluate the number of incidence of a given error<br>
> message (for example). Sample query:<br>
> <a href="http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries/1097592.yaml">http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries/1097592.yaml</a><br>
><br>
> The elastic-recheck repo is here:<br>
> <a href="http://git.openstack.org/cgit/openstack-infra/elastic-recheck/">http://git.openstack.org/cgit/openstack-infra/elastic-recheck/</a><br>
><br>
> The gui is available at <a href="http://logstash.openstack.org">logstash.openstack.org</a>.<br>
><br>
> All the queries are written manually as yaml files and named with a<br>
> corresponding bug number:<br>
> <a href="http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries">http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries</a><br>
><br>
> Here is some documentation about elastic-recheck and how to write<br>
> queries: <a href="http://docs.openstack.org/infra/elastic-recheck/readme.html">http://docs.openstack.org/infra/elastic-recheck/readme.html</a><br>
><br>
> Joe Gordon actually has created some great graphs (where are those<br>
> hosted again, Joe?) to be able to evaluate failure rates in the<br>
> pipelines (check and gate) based on test groups (tempest, unit tests).</p>
<p dir="ltr"><a href="http://jogo.github.io/gate">http://jogo.github.io/gate</a></p>
<p dir="ltr">The data comes from <a href="http://graphite.openstack.org">graphite.openstack.org</a> and is hosted off site because the data requires some interpretation and should be viewed with a grain of salt.</p>
<p dir="ltr">><br>
> Clark Boylan and Sean Dague did and do the majority of the heavy lifting<br>
> setting up and maintaining elastic-recheck (with lots of help from<br>
> others, thank you!) so perhaps they could offer their opinion on if this<br>
> is a reasonable choice for evaluating third party ci systems.<br>
><br>
> Thanks Jon Paul, this is a good question,<br>
> Anita.<br>
> ><br>
> >><br>
> >><br>
> >> [0]<br>
> >> <a href="https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part">https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part</a><br>
> >> y_meetings<br>
> >><br>
> >> _______________________________________________<br>
> >> OpenStack-dev mailing list<br>
> >> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
> > _______________________________________________<br>
> > OpenStack-dev mailing list<br>
> > <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
><br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</p>