[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Kevin Benton blak111 at gmail.com
Thu Jul 3 18:33:57 UTC 2014


Maybe we can require period checks against the head of the master
branch (which should always pass) and build statistics based on the results
of that. Otherwise it seems like we have to take a CI system's word for it
that a particular patch indeed broke that system.

--
Kevin Benton


On Thu, Jul 3, 2014 at 11:07 AM, Anita Kuno <anteaya at anteaya.info> wrote:

> On 07/03/2014 01:27 PM, Kevin Benton wrote:
> >> This allows the viewer to see categories of reviews based upon their
> >> divergence from OpenStack's Jenkins results. I think evaluating
> >> divergence from Jenkins might be a metric worth consideration.
> >
> > I think the only thing this really reflects though is how much the third
> > party CI system is mirroring Jenkins.
> > A system that frequently diverges may be functioning perfectly fine and
> > just has a vastly different code path that it is integration testing so
> it
> > is legitimately detecting failures the OpenStack CI cannot.
> Great.
>
> How do we measure the degree to which it is legitimately detecting
> failures?
>
> Thanks Kevin,
> Anita.
> >
> > --
> > Kevin Benton
> >
> >
> > On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <anteaya at anteaya.info> wrote:
> >
> >> On 07/03/2014 07:12 AM, Salvatore Orlando wrote:
> >>> Apologies for quoting again the top post of the thread.
> >>>
> >>> Comments inline (mostly thinking aloud)
> >>> Salvatore
> >>>
> >>>
> >>> On 30 June 2014 22:22, Jay Pipes <jaypipes at gmail.com> wrote:
> >>>
> >>>> Hi Stackers,
> >>>>
> >>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought up
> >> some
> >>>> legitimate questions around how a newly-proposed Stackalytics report
> >> page
> >>>> for Neutron External CI systems [2] represented the results of an
> >> external
> >>>> CI system as "successful" or not.
> >>>>
> >>>> First, I want to say that Ilya and all those involved in the
> >> Stackalytics
> >>>> program simply want to provide the most accurate information to
> >> developers
> >>>> in a format that is easily consumed. While there need to be some
> >> changes in
> >>>> how data is shown (and the wording of things like "Tests Succeeded"),
> I
> >>>> hope that the community knows there isn't any ill intent on the part
> of
> >>>> Mirantis or anyone who works on Stackalytics. OK, so let's keep the
> >>>> conversation civil -- we're all working towards the same goals of
> >>>> transparency and accuracy. :)
> >>>>
> >>>> Alright, now, Anita and Kurt Taylor were asking a very poignant
> >> question:
> >>>>
> >>>> "But what does CI tested really mean? just running tests? or tested to
> >>>> pass some level of requirements?"
> >>>>
> >>>> In this nascent world of external CI systems, we have a set of issues
> >> that
> >>>> we need to resolve:
> >>>>
> >>>> 1) All of the CI systems are different.
> >>>>
> >>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
> >> scripts.
> >>>> Others run custom Python code that spawns VMs and publishes logs to
> some
> >>>> public domain.
> >>>>
> >>>> As a community, we need to decide whether it is worth putting in the
> >>>> effort to create a single, unified, installable and runnable CI
> system,
> >> so
> >>>> that we can legitimately say "all of the external systems are
> identical,
> >>>> with the exception of the driver code for vendor X being substituted
> in
> >> the
> >>>> Neutron codebase."
> >>>>
> >>>
> >>> I think such system already exists, and it's documented here:
> >>> http://ci.openstack.org/
> >>> Still, understanding it is quite a learning curve, and running it is
> not
> >>> exactly straightforward. But I guess that's pretty much understandable
> >>> given the complexity of the system, isn't it?
> >>>
> >>>
> >>>>
> >>>> If the goal of the external CI systems is to produce reliable,
> >> consistent
> >>>> results, I feel the answer to the above is "yes", but I'm interested
> to
> >>>> hear what others think. Frankly, in the world of benchmarks, it would
> be
> >>>> unthinkable to say "go ahead and everyone run your own benchmark
> suite",
> >>>> because you would get wildly different results. A similar problem has
> >>>> emerged here.
> >>>>
> >>>
> >>> I don't think the particular infrastructure which might range from an
> >>> openstack-ci clone to a 100-line bash script would have an impact on
> the
> >>> "reliability" of the quality assessment regarding a particular driver
> or
> >>> plugin. This is determined, in my opinion, by the quantity and nature
> of
> >>> tests one runs on a specific driver. In Neutron for instance, there is
> a
> >>> wide range of choices - from a few test cases in tempest.api.network to
> >> the
> >>> full smoketest job. As long there is no minimal standard here, then it
> >>> would be difficult to assess the quality of the evaluation from a CI
> >>> system, unless we explicitly keep into account coverage into the
> >> evaluation.
> >>>
> >>> On the other hand, different CI infrastructures will have different
> >> levels
> >>> in terms of % of patches tested and % of infrastructure failures. I
> think
> >>> it might not be a terrible idea to use these parameters to evaluate how
> >>> good a CI is from an infra standpoint. However, there are still open
> >>> questions. For instance, a CI might have a low patch % score because it
> >>> only needs to test patches affecting a given driver.
> >>>
> >>>
> >>>> 2) There is no mediation or verification that the external CI system
> is
> >>>> actually testing anything at all
> >>>>
> >>>> As a community, we need to decide whether the current system of
> >>>> self-policing should continue. If it should, then language on reports
> >> like
> >>>> [3] should be very clear that any numbers derived from such systems
> >> should
> >>>> be taken with a grain of salt. Use of the word "Success" should be
> >> avoided,
> >>>> as it has connotations (in English, at least) that the result has been
> >>>> verified, which is simply not the case as long as no verification or
> >>>> mediation occurs for any external CI system.
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>> 3) There is no clear indication of what tests are being run, and
> >> therefore
> >>>> there is no clear indication of what "success" is
> >>>>
> >>>> I think we can all agree that a test has three possible outcomes:
> pass,
> >>>> fail, and skip. The results of a test suite run therefore is nothing
> >> more
> >>>> than the aggregation of which tests passed, which failed, and which
> were
> >>>> skipped.
> >>>>
> >>>> As a community, we must document, for each project, what are expected
> >> set
> >>>> of tests that must be run for each merged patch into the project's
> >> source
> >>>> tree. This documentation should be discoverable so that reports like
> [3]
> >>>> can be crystal-clear on what the data shown actually means. The report
> >> is
> >>>> simply displaying the data it receives from Gerrit. The community
> needs
> >> to
> >>>> be proactive in saying "this is what is expected to be tested." This
> >> alone
> >>>> would allow the report to give information such as "External CI system
> >> ABC
> >>>> performed the expected tests. X tests passed. Y tests failed. Z tests
> >> were
> >>>> skipped." Likewise, it would also make it possible for the report to
> >> give
> >>>> information such as "External CI system DEF did not perform the
> expected
> >>>> tests.", which is excellent information in and of itself.
> >>>>
> >>>>
> >>> Agreed. In Neutron we have enforced CIs but not yet agreed on what's
> the
> >>> minimum set of tests we expect them to run. I reckon this will be fixed
> >>> soon.
> >>>
> >>> I'll try to look at what "SUCCESS" is from a naive standpoint: a CI
> says
> >>> "SUCCESS" if the test suite it rans passed; then one should have means
> to
> >>> understand whether a CI might blatantly lie or tell "half truths". For
> >>> instance saying it passes tempest.api.network while
> >>> tempest.scenario.test_network_basic_ops has not been executed is a half
> >>> truth, in my opinion.
> >>> Stackalitycs can help here, I think. One could create "CI classes"
> >>> according to how much they're close to the level of the upstream gate,
> >> and
> >>> then parse results posted to classify CIs. Now, before cursing me, I
> >>> totally understand that this won't be easy at all to implement!
> >>> Furthermore, I don't know whether how this should be reflected in
> gerrit.
> >>>
> >>>
> >>>> ===
> >>>>
> >>>> In thinking about the likely answers to the above questions, I believe
> >> it
> >>>> would be prudent to change the Stackalytics report in question [3] in
> >> the
> >>>> following ways:
> >>>>
> >>>> a. Change the "Success %" column header to "% Reported +1 Votes"
> >>>> b. Change the phrase " Green cell - tests ran successfully, red cell -
> >>>> tests failed" to "Green cell - System voted +1, red cell - System
> voted
> >> -1"
> >>>>
> >>>
> >>> That makes sense to me.
> >>>
> >>>
> >>>>
> >>>> and then, when we have more and better data (for example, # tests
> >> passed,
> >>>> failed, skipped, etc), we can provide more detailed information than
> >> just
> >>>> "reported +1" or not.
> >>>>
> >>>
> >>> I think it should not be too hard to start adding minimal measures such
> >> as
> >>> "% of voted patches"
> >>>
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Best,
> >>>> -jay
> >>>>
> >>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2014-
> >>>> June/038933.html
> >>>> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/
> >>>> third_party.2014-06-30-18.01.log.html
> >>>> [3] http://stackalytics.com/report/ci/neutron/7
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >> Thanks for sharing your thoughts, Salvadore.
> >>
> >> Some additional things to look at:
> >>
> >> Sean Dague has created a tool in stackforge gerrit-dash-creator:
> >>
> >>
> http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst
> >> which has the ability to make interesting queries on gerrit results. One
> >> such example can be found here: http://paste.openstack.org/show/85416/
> >> (Note when this url was created there was a bug in the syntax and this
> >> url works in chrome but not firefox, Sean tells me the firefox bug has
> >> been addressed - though this url hasn't been altered with the new syntax
> >> yet)
> >>
> >> This allows the viewer to see categories of reviews based upon their
> >> divergence from OpenStack's Jenkins results. I think evaluating
> >> divergence from Jenkins might be a metric worth consideration.
> >>
> >> Also a gui representation worth looking at is Mikal Still's gui for
> >> Neutron ci health:
> >> http://www.rcbops.com/gerrit/reports/neutron-cireport.html
> >> and Nova ci health:
> >> http://www.rcbops.com/gerrit/reports/nova-cireport.html
> >>
> >> I don't know the details of how the graphs are calculated in these
> >> pages, but being able to view passed/failed/missed and compare them to
> >> Jenkins is an interesting approach and I feel has some merit.
> >>
> >> Thanks I think we are getting some good information out in this thread
> >> and look forward to hearing more thoughts.
> >>
> >> Thank you,
> >> Anita.
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140703/7bc76fa4/attachment.html>


More information about the OpenStack-dev mailing list