[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Anita Kuno anteaya at anteaya.info
Thu Jul 3 18:42:23 UTC 2014


On 07/03/2014 02:33 PM, Kevin Benton wrote:
> Maybe we can require period checks against the head of the master
> branch (which should always pass) and build statistics based on the results
> of that. 
I like this suggestion. I really like this suggestion.

Hmmmm, what to do with a good suggestion? I wonder if we could capture
it in an infra-spec and work on it from there.

Would you feel comfortable offering a draft as an infra-spec and then
perhaps we can discuss the design through the spec?

What do you think?

Thanks Kevin,
Anita.

> Otherwise it seems like we have to take a CI system's word for it
> that a particular patch indeed broke that system.
> 
> --
> Kevin Benton
> 
> 
> On Thu, Jul 3, 2014 at 11:07 AM, Anita Kuno <anteaya at anteaya.info> wrote:
> 
>> On 07/03/2014 01:27 PM, Kevin Benton wrote:
>>>> This allows the viewer to see categories of reviews based upon their
>>>> divergence from OpenStack's Jenkins results. I think evaluating
>>>> divergence from Jenkins might be a metric worth consideration.
>>>
>>> I think the only thing this really reflects though is how much the third
>>> party CI system is mirroring Jenkins.
>>> A system that frequently diverges may be functioning perfectly fine and
>>> just has a vastly different code path that it is integration testing so
>> it
>>> is legitimately detecting failures the OpenStack CI cannot.
>> Great.
>>
>> How do we measure the degree to which it is legitimately detecting
>> failures?
>>
>> Thanks Kevin,
>> Anita.
>>>
>>> --
>>> Kevin Benton
>>>
>>>
>>> On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <anteaya at anteaya.info> wrote:
>>>
>>>> On 07/03/2014 07:12 AM, Salvatore Orlando wrote:
>>>>> Apologies for quoting again the top post of the thread.
>>>>>
>>>>> Comments inline (mostly thinking aloud)
>>>>> Salvatore
>>>>>
>>>>>
>>>>> On 30 June 2014 22:22, Jay Pipes <jaypipes at gmail.com> wrote:
>>>>>
>>>>>> Hi Stackers,
>>>>>>
>>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought up
>>>> some
>>>>>> legitimate questions around how a newly-proposed Stackalytics report
>>>> page
>>>>>> for Neutron External CI systems [2] represented the results of an
>>>> external
>>>>>> CI system as "successful" or not.
>>>>>>
>>>>>> First, I want to say that Ilya and all those involved in the
>>>> Stackalytics
>>>>>> program simply want to provide the most accurate information to
>>>> developers
>>>>>> in a format that is easily consumed. While there need to be some
>>>> changes in
>>>>>> how data is shown (and the wording of things like "Tests Succeeded"),
>> I
>>>>>> hope that the community knows there isn't any ill intent on the part
>> of
>>>>>> Mirantis or anyone who works on Stackalytics. OK, so let's keep the
>>>>>> conversation civil -- we're all working towards the same goals of
>>>>>> transparency and accuracy. :)
>>>>>>
>>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant
>>>> question:
>>>>>>
>>>>>> "But what does CI tested really mean? just running tests? or tested to
>>>>>> pass some level of requirements?"
>>>>>>
>>>>>> In this nascent world of external CI systems, we have a set of issues
>>>> that
>>>>>> we need to resolve:
>>>>>>
>>>>>> 1) All of the CI systems are different.
>>>>>>
>>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
>>>> scripts.
>>>>>> Others run custom Python code that spawns VMs and publishes logs to
>> some
>>>>>> public domain.
>>>>>>
>>>>>> As a community, we need to decide whether it is worth putting in the
>>>>>> effort to create a single, unified, installable and runnable CI
>> system,
>>>> so
>>>>>> that we can legitimately say "all of the external systems are
>> identical,
>>>>>> with the exception of the driver code for vendor X being substituted
>> in
>>>> the
>>>>>> Neutron codebase."
>>>>>>
>>>>>
>>>>> I think such system already exists, and it's documented here:
>>>>> http://ci.openstack.org/
>>>>> Still, understanding it is quite a learning curve, and running it is
>> not
>>>>> exactly straightforward. But I guess that's pretty much understandable
>>>>> given the complexity of the system, isn't it?
>>>>>
>>>>>
>>>>>>
>>>>>> If the goal of the external CI systems is to produce reliable,
>>>> consistent
>>>>>> results, I feel the answer to the above is "yes", but I'm interested
>> to
>>>>>> hear what others think. Frankly, in the world of benchmarks, it would
>> be
>>>>>> unthinkable to say "go ahead and everyone run your own benchmark
>> suite",
>>>>>> because you would get wildly different results. A similar problem has
>>>>>> emerged here.
>>>>>>
>>>>>
>>>>> I don't think the particular infrastructure which might range from an
>>>>> openstack-ci clone to a 100-line bash script would have an impact on
>> the
>>>>> "reliability" of the quality assessment regarding a particular driver
>> or
>>>>> plugin. This is determined, in my opinion, by the quantity and nature
>> of
>>>>> tests one runs on a specific driver. In Neutron for instance, there is
>> a
>>>>> wide range of choices - from a few test cases in tempest.api.network to
>>>> the
>>>>> full smoketest job. As long there is no minimal standard here, then it
>>>>> would be difficult to assess the quality of the evaluation from a CI
>>>>> system, unless we explicitly keep into account coverage into the
>>>> evaluation.
>>>>>
>>>>> On the other hand, different CI infrastructures will have different
>>>> levels
>>>>> in terms of % of patches tested and % of infrastructure failures. I
>> think
>>>>> it might not be a terrible idea to use these parameters to evaluate how
>>>>> good a CI is from an infra standpoint. However, there are still open
>>>>> questions. For instance, a CI might have a low patch % score because it
>>>>> only needs to test patches affecting a given driver.
>>>>>
>>>>>
>>>>>> 2) There is no mediation or verification that the external CI system
>> is
>>>>>> actually testing anything at all
>>>>>>
>>>>>> As a community, we need to decide whether the current system of
>>>>>> self-policing should continue. If it should, then language on reports
>>>> like
>>>>>> [3] should be very clear that any numbers derived from such systems
>>>> should
>>>>>> be taken with a grain of salt. Use of the word "Success" should be
>>>> avoided,
>>>>>> as it has connotations (in English, at least) that the result has been
>>>>>> verified, which is simply not the case as long as no verification or
>>>>>> mediation occurs for any external CI system.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> 3) There is no clear indication of what tests are being run, and
>>>> therefore
>>>>>> there is no clear indication of what "success" is
>>>>>>
>>>>>> I think we can all agree that a test has three possible outcomes:
>> pass,
>>>>>> fail, and skip. The results of a test suite run therefore is nothing
>>>> more
>>>>>> than the aggregation of which tests passed, which failed, and which
>> were
>>>>>> skipped.
>>>>>>
>>>>>> As a community, we must document, for each project, what are expected
>>>> set
>>>>>> of tests that must be run for each merged patch into the project's
>>>> source
>>>>>> tree. This documentation should be discoverable so that reports like
>> [3]
>>>>>> can be crystal-clear on what the data shown actually means. The report
>>>> is
>>>>>> simply displaying the data it receives from Gerrit. The community
>> needs
>>>> to
>>>>>> be proactive in saying "this is what is expected to be tested." This
>>>> alone
>>>>>> would allow the report to give information such as "External CI system
>>>> ABC
>>>>>> performed the expected tests. X tests passed. Y tests failed. Z tests
>>>> were
>>>>>> skipped." Likewise, it would also make it possible for the report to
>>>> give
>>>>>> information such as "External CI system DEF did not perform the
>> expected
>>>>>> tests.", which is excellent information in and of itself.
>>>>>>
>>>>>>
>>>>> Agreed. In Neutron we have enforced CIs but not yet agreed on what's
>> the
>>>>> minimum set of tests we expect them to run. I reckon this will be fixed
>>>>> soon.
>>>>>
>>>>> I'll try to look at what "SUCCESS" is from a naive standpoint: a CI
>> says
>>>>> "SUCCESS" if the test suite it rans passed; then one should have means
>> to
>>>>> understand whether a CI might blatantly lie or tell "half truths". For
>>>>> instance saying it passes tempest.api.network while
>>>>> tempest.scenario.test_network_basic_ops has not been executed is a half
>>>>> truth, in my opinion.
>>>>> Stackalitycs can help here, I think. One could create "CI classes"
>>>>> according to how much they're close to the level of the upstream gate,
>>>> and
>>>>> then parse results posted to classify CIs. Now, before cursing me, I
>>>>> totally understand that this won't be easy at all to implement!
>>>>> Furthermore, I don't know whether how this should be reflected in
>> gerrit.
>>>>>
>>>>>
>>>>>> ===
>>>>>>
>>>>>> In thinking about the likely answers to the above questions, I believe
>>>> it
>>>>>> would be prudent to change the Stackalytics report in question [3] in
>>>> the
>>>>>> following ways:
>>>>>>
>>>>>> a. Change the "Success %" column header to "% Reported +1 Votes"
>>>>>> b. Change the phrase " Green cell - tests ran successfully, red cell -
>>>>>> tests failed" to "Green cell - System voted +1, red cell - System
>> voted
>>>> -1"
>>>>>>
>>>>>
>>>>> That makes sense to me.
>>>>>
>>>>>
>>>>>>
>>>>>> and then, when we have more and better data (for example, # tests
>>>> passed,
>>>>>> failed, skipped, etc), we can provide more detailed information than
>>>> just
>>>>>> "reported +1" or not.
>>>>>>
>>>>>
>>>>> I think it should not be too hard to start adding minimal measures such
>>>> as
>>>>> "% of voted patches"
>>>>>
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Best,
>>>>>> -jay
>>>>>>
>>>>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2014-
>>>>>> June/038933.html
>>>>>> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/
>>>>>> third_party.2014-06-30-18.01.log.html
>>>>>> [3] http://stackalytics.com/report/ci/neutron/7
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>> Thanks for sharing your thoughts, Salvadore.
>>>>
>>>> Some additional things to look at:
>>>>
>>>> Sean Dague has created a tool in stackforge gerrit-dash-creator:
>>>>
>>>>
>> http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst
>>>> which has the ability to make interesting queries on gerrit results. One
>>>> such example can be found here: http://paste.openstack.org/show/85416/
>>>> (Note when this url was created there was a bug in the syntax and this
>>>> url works in chrome but not firefox, Sean tells me the firefox bug has
>>>> been addressed - though this url hasn't been altered with the new syntax
>>>> yet)
>>>>
>>>> This allows the viewer to see categories of reviews based upon their
>>>> divergence from OpenStack's Jenkins results. I think evaluating
>>>> divergence from Jenkins might be a metric worth consideration.
>>>>
>>>> Also a gui representation worth looking at is Mikal Still's gui for
>>>> Neutron ci health:
>>>> http://www.rcbops.com/gerrit/reports/neutron-cireport.html
>>>> and Nova ci health:
>>>> http://www.rcbops.com/gerrit/reports/nova-cireport.html
>>>>
>>>> I don't know the details of how the graphs are calculated in these
>>>> pages, but being able to view passed/failed/missed and compare them to
>>>> Jenkins is an interesting approach and I feel has some merit.
>>>>
>>>> Thanks I think we are getting some good information out in this thread
>>>> and look forward to hearing more thoughts.
>>>>
>>>> Thank you,
>>>> Anita.
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list