<div dir="ltr">Yes, I can propose a spec for that. It probably won't be until Monday. <div>Is that okay?</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jul 3, 2014 at 11:42 AM, Anita Kuno <span dir="ltr"><<a href="mailto:anteaya@anteaya.info" target="_blank">anteaya@anteaya.info</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 07/03/2014 02:33 PM, Kevin Benton wrote:<br>

> Maybe we can require period checks against the head of the master<br>

> branch (which should always pass) and build statistics based on the results<br>

> of that.<br>

</div>I like this suggestion. I really like this suggestion.<br>

<br>

Hmmmm, what to do with a good suggestion? I wonder if we could capture<br>

it in an infra-spec and work on it from there.<br>

<br>

Would you feel comfortable offering a draft as an infra-spec and then<br>

perhaps we can discuss the design through the spec?<br>

<br>

What do you think?<br>

<br>

Thanks Kevin,<br>

Anita.<br>

<div class="HOEnZb"><div class="h5"><br>

> Otherwise it seems like we have to take a CI system's word for it<br>

> that a particular patch indeed broke that system.<br>

><br>

> --<br>

> Kevin Benton<br>

><br>

><br>

> On Thu, Jul 3, 2014 at 11:07 AM, Anita Kuno <<a href="mailto:anteaya@anteaya.info">anteaya@anteaya.info</a>> wrote:<br>

><br>

>> On 07/03/2014 01:27 PM, Kevin Benton wrote:<br>

>>>> This allows the viewer to see categories of reviews based upon their<br>

>>>> divergence from OpenStack's Jenkins results. I think evaluating<br>

>>>> divergence from Jenkins might be a metric worth consideration.<br>

>>><br>

>>> I think the only thing this really reflects though is how much the third<br>

>>> party CI system is mirroring Jenkins.<br>

>>> A system that frequently diverges may be functioning perfectly fine and<br>

>>> just has a vastly different code path that it is integration testing so<br>

>> it<br>

>>> is legitimately detecting failures the OpenStack CI cannot.<br>

>> Great.<br>

>><br>

>> How do we measure the degree to which it is legitimately detecting<br>

>> failures?<br>

>><br>

>> Thanks Kevin,<br>

>> Anita.<br>

>>><br>

>>> --<br>

>>> Kevin Benton<br>

>>><br>

>>><br>

>>> On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <<a href="mailto:anteaya@anteaya.info">anteaya@anteaya.info</a>> wrote:<br>

>>><br>

>>>> On 07/03/2014 07:12 AM, Salvatore Orlando wrote:<br>

>>>>> Apologies for quoting again the top post of the thread.<br>

>>>>><br>

>>>>> Comments inline (mostly thinking aloud)<br>

>>>>> Salvatore<br>

>>>>><br>

>>>>><br>

>>>>> On 30 June 2014 22:22, Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>> wrote:<br>

>>>>><br>

>>>>>> Hi Stackers,<br>

>>>>>><br>

>>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought up<br>

>>>> some<br>

>>>>>> legitimate questions around how a newly-proposed Stackalytics report<br>

>>>> page<br>

>>>>>> for Neutron External CI systems [2] represented the results of an<br>

>>>> external<br>

>>>>>> CI system as "successful" or not.<br>

>>>>>><br>

>>>>>> First, I want to say that Ilya and all those involved in the<br>

>>>> Stackalytics<br>

>>>>>> program simply want to provide the most accurate information to<br>

>>>> developers<br>

>>>>>> in a format that is easily consumed. While there need to be some<br>

>>>> changes in<br>

>>>>>> how data is shown (and the wording of things like "Tests Succeeded"),<br>

>> I<br>

>>>>>> hope that the community knows there isn't any ill intent on the part<br>

>> of<br>

>>>>>> Mirantis or anyone who works on Stackalytics. OK, so let's keep the<br>

>>>>>> conversation civil -- we're all working towards the same goals of<br>

>>>>>> transparency and accuracy. :)<br>

>>>>>><br>

>>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant<br>

>>>> question:<br>

>>>>>><br>

>>>>>> "But what does CI tested really mean? just running tests? or tested to<br>

>>>>>> pass some level of requirements?"<br>

>>>>>><br>

>>>>>> In this nascent world of external CI systems, we have a set of issues<br>

>>>> that<br>

>>>>>> we need to resolve:<br>

>>>>>><br>

>>>>>> 1) All of the CI systems are different.<br>

>>>>>><br>

>>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate<br>

>>>> scripts.<br>

>>>>>> Others run custom Python code that spawns VMs and publishes logs to<br>

>> some<br>

>>>>>> public domain.<br>

>>>>>><br>

>>>>>> As a community, we need to decide whether it is worth putting in the<br>

>>>>>> effort to create a single, unified, installable and runnable CI<br>

>> system,<br>

>>>> so<br>

>>>>>> that we can legitimately say "all of the external systems are<br>

>> identical,<br>

>>>>>> with the exception of the driver code for vendor X being substituted<br>

>> in<br>

>>>> the<br>

>>>>>> Neutron codebase."<br>

>>>>>><br>

>>>>><br>

>>>>> I think such system already exists, and it's documented here:<br>

>>>>> <a href="http://ci.openstack.org/" target="_blank">http://ci.openstack.org/</a><br>

>>>>> Still, understanding it is quite a learning curve, and running it is<br>

>> not<br>

>>>>> exactly straightforward. But I guess that's pretty much understandable<br>

>>>>> given the complexity of the system, isn't it?<br>

>>>>><br>

>>>>><br>

>>>>>><br>

>>>>>> If the goal of the external CI systems is to produce reliable,<br>

>>>> consistent<br>

>>>>>> results, I feel the answer to the above is "yes", but I'm interested<br>

>> to<br>

>>>>>> hear what others think. Frankly, in the world of benchmarks, it would<br>

>> be<br>

>>>>>> unthinkable to say "go ahead and everyone run your own benchmark<br>

>> suite",<br>

>>>>>> because you would get wildly different results. A similar problem has<br>

>>>>>> emerged here.<br>

>>>>>><br>

>>>>><br>

>>>>> I don't think the particular infrastructure which might range from an<br>

>>>>> openstack-ci clone to a 100-line bash script would have an impact on<br>

>> the<br>

>>>>> "reliability" of the quality assessment regarding a particular driver<br>

>> or<br>

>>>>> plugin. This is determined, in my opinion, by the quantity and nature<br>

>> of<br>

>>>>> tests one runs on a specific driver. In Neutron for instance, there is<br>

>> a<br>

>>>>> wide range of choices - from a few test cases in tempest.api.network to<br>

>>>> the<br>

>>>>> full smoketest job. As long there is no minimal standard here, then it<br>

>>>>> would be difficult to assess the quality of the evaluation from a CI<br>

>>>>> system, unless we explicitly keep into account coverage into the<br>

>>>> evaluation.<br>

>>>>><br>

>>>>> On the other hand, different CI infrastructures will have different<br>

>>>> levels<br>

>>>>> in terms of % of patches tested and % of infrastructure failures. I<br>

>> think<br>

>>>>> it might not be a terrible idea to use these parameters to evaluate how<br>

>>>>> good a CI is from an infra standpoint. However, there are still open<br>

>>>>> questions. For instance, a CI might have a low patch % score because it<br>

>>>>> only needs to test patches affecting a given driver.<br>

>>>>><br>

>>>>><br>

>>>>>> 2) There is no mediation or verification that the external CI system<br>

>> is<br>

>>>>>> actually testing anything at all<br>

>>>>>><br>

>>>>>> As a community, we need to decide whether the current system of<br>

>>>>>> self-policing should continue. If it should, then language on reports<br>

>>>> like<br>

>>>>>> [3] should be very clear that any numbers derived from such systems<br>

>>>> should<br>

>>>>>> be taken with a grain of salt. Use of the word "Success" should be<br>

>>>> avoided,<br>

>>>>>> as it has connotations (in English, at least) that the result has been<br>

>>>>>> verified, which is simply not the case as long as no verification or<br>

>>>>>> mediation occurs for any external CI system.<br>

>>>>>><br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>>> 3) There is no clear indication of what tests are being run, and<br>

>>>> therefore<br>

>>>>>> there is no clear indication of what "success" is<br>

>>>>>><br>

>>>>>> I think we can all agree that a test has three possible outcomes:<br>

>> pass,<br>

>>>>>> fail, and skip. The results of a test suite run therefore is nothing<br>

>>>> more<br>

>>>>>> than the aggregation of which tests passed, which failed, and which<br>

>> were<br>

>>>>>> skipped.<br>

>>>>>><br>

>>>>>> As a community, we must document, for each project, what are expected<br>

>>>> set<br>

>>>>>> of tests that must be run for each merged patch into the project's<br>

>>>> source<br>

>>>>>> tree. This documentation should be discoverable so that reports like<br>

>> [3]<br>

>>>>>> can be crystal-clear on what the data shown actually means. The report<br>

>>>> is<br>

>>>>>> simply displaying the data it receives from Gerrit. The community<br>

>> needs<br>

>>>> to<br>

>>>>>> be proactive in saying "this is what is expected to be tested." This<br>

>>>> alone<br>

>>>>>> would allow the report to give information such as "External CI system<br>

>>>> ABC<br>

>>>>>> performed the expected tests. X tests passed. Y tests failed. Z tests<br>

>>>> were<br>

>>>>>> skipped." Likewise, it would also make it possible for the report to<br>

>>>> give<br>

>>>>>> information such as "External CI system DEF did not perform the<br>

>> expected<br>

>>>>>> tests.", which is excellent information in and of itself.<br>

>>>>>><br>

>>>>>><br>

>>>>> Agreed. In Neutron we have enforced CIs but not yet agreed on what's<br>

>> the<br>

>>>>> minimum set of tests we expect them to run. I reckon this will be fixed<br>

>>>>> soon.<br>

>>>>><br>

>>>>> I'll try to look at what "SUCCESS" is from a naive standpoint: a CI<br>

>> says<br>

>>>>> "SUCCESS" if the test suite it rans passed; then one should have means<br>

>> to<br>

>>>>> understand whether a CI might blatantly lie or tell "half truths". For<br>

>>>>> instance saying it passes tempest.api.network while<br>

>>>>> tempest.scenario.test_network_basic_ops has not been executed is a half<br>

>>>>> truth, in my opinion.<br>

>>>>> Stackalitycs can help here, I think. One could create "CI classes"<br>

>>>>> according to how much they're close to the level of the upstream gate,<br>

>>>> and<br>

>>>>> then parse results posted to classify CIs. Now, before cursing me, I<br>

>>>>> totally understand that this won't be easy at all to implement!<br>

>>>>> Furthermore, I don't know whether how this should be reflected in<br>

>> gerrit.<br>

>>>>><br>

>>>>><br>

>>>>>> ===<br>

>>>>>><br>

>>>>>> In thinking about the likely answers to the above questions, I believe<br>

>>>> it<br>

>>>>>> would be prudent to change the Stackalytics report in question [3] in<br>

>>>> the<br>

>>>>>> following ways:<br>

>>>>>><br>

>>>>>> a. Change the "Success %" column header to "% Reported +1 Votes"<br>

>>>>>> b. Change the phrase " Green cell - tests ran successfully, red cell -<br>

>>>>>> tests failed" to "Green cell - System voted +1, red cell - System<br>

>> voted<br>

>>>> -1"<br>

>>>>>><br>

>>>>><br>

>>>>> That makes sense to me.<br>

>>>>><br>

>>>>><br>

>>>>>><br>

>>>>>> and then, when we have more and better data (for example, # tests<br>

>>>> passed,<br>

>>>>>> failed, skipped, etc), we can provide more detailed information than<br>

>>>> just<br>

>>>>>> "reported +1" or not.<br>

>>>>>><br>

>>>>><br>

>>>>> I think it should not be too hard to start adding minimal measures such<br>

>>>> as<br>

>>>>> "% of voted patches"<br>

>>>>><br>

>>>>>><br>

>>>>>> Thoughts?<br>

>>>>>><br>

>>>>>> Best,<br>

>>>>>> -jay<br>

>>>>>><br>

>>>>>> [1] <a href="http://lists.openstack.org/pipermail/openstack-dev/2014-" target="_blank">http://lists.openstack.org/pipermail/openstack-dev/2014-</a><br>

>>>>>> June/038933.html<br>

>>>>>> [2] <a href="http://eavesdrop.openstack.org/meetings/third_party/2014/" target="_blank">http://eavesdrop.openstack.org/meetings/third_party/2014/</a><br>

>>>>>> third_party.<a href="tel:2014-06-30-18" value="+12014063018">2014-06-30-18</a>.01.log.html<br>

>>>>>> [3] <a href="http://stackalytics.com/report/ci/neutron/7" target="_blank">http://stackalytics.com/report/ci/neutron/7</a><br>

>>>>>><br>

>>>>>> _______________________________________________<br>

>>>>>> OpenStack-dev mailing list<br>

>>>>>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

>>>>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>>>>>><br>

>>>>><br>

>>>>><br>

>>>>><br>

>>>>> _______________________________________________<br>

>>>>> OpenStack-dev mailing list<br>

>>>>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

>>>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>>>>><br>

>>>> Thanks for sharing your thoughts, Salvadore.<br>

>>>><br>

>>>> Some additional things to look at:<br>

>>>><br>

>>>> Sean Dague has created a tool in stackforge gerrit-dash-creator:<br>

>>>><br>

>>>><br>

>> <a href="http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst" target="_blank">http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst</a><br>

>>>> which has the ability to make interesting queries on gerrit results. One<br>

>>>> such example can be found here: <a href="http://paste.openstack.org/show/85416/" target="_blank">http://paste.openstack.org/show/85416/</a><br>

>>>> (Note when this url was created there was a bug in the syntax and this<br>

>>>> url works in chrome but not firefox, Sean tells me the firefox bug has<br>

>>>> been addressed - though this url hasn't been altered with the new syntax<br>

>>>> yet)<br>

>>>><br>

>>>> This allows the viewer to see categories of reviews based upon their<br>

>>>> divergence from OpenStack's Jenkins results. I think evaluating<br>

>>>> divergence from Jenkins might be a metric worth consideration.<br>

>>>><br>

>>>> Also a gui representation worth looking at is Mikal Still's gui for<br>

>>>> Neutron ci health:<br>

>>>> <a href="http://www.rcbops.com/gerrit/reports/neutron-cireport.html" target="_blank">http://www.rcbops.com/gerrit/reports/neutron-cireport.html</a><br>

>>>> and Nova ci health:<br>

>>>> <a href="http://www.rcbops.com/gerrit/reports/nova-cireport.html" target="_blank">http://www.rcbops.com/gerrit/reports/nova-cireport.html</a><br>

>>>><br>

>>>> I don't know the details of how the graphs are calculated in these<br>

>>>> pages, but being able to view passed/failed/missed and compare them to<br>

>>>> Jenkins is an interesting approach and I feel has some merit.<br>

>>>><br>

>>>> Thanks I think we are getting some good information out in this thread<br>

>>>> and look forward to hearing more thoughts.<br>

>>>><br>

>>>> Thank you,<br>

>>>> Anita.<br>

>>>><br>

>>>> _______________________________________________<br>

>>>> OpenStack-dev mailing list<br>

>>>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

>>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>>>><br>

>>><br>

>>><br>

>>><br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> OpenStack-dev mailing list<br>

>>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>>><br>

>><br>

>><br>

>> _______________________________________________<br>

>> OpenStack-dev mailing list<br>

>> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

>><br>

><br>

><br>

><br>

><br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

<br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Kevin Benton</div>

</div>