[openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Anita Kuno anteaya at anteaya.info
Thu Jul 3 14:05:48 UTC 2014


On 07/03/2014 09:52 AM, Sullivan, Jon Paul wrote:
>> -----Original Message-----
>> From: Anita Kuno [mailto:anteaya at anteaya.info]
>> Sent: 03 July 2014 13:53
>> To: openstack-dev at lists.openstack.org
>> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success"
>> exactly?
>>
>> On 07/03/2014 06:22 AM, Sullivan, Jon Paul wrote:
>>>> -----Original Message-----
>>>> From: Anita Kuno [mailto:anteaya at anteaya.info]
>>>> Sent: 01 July 2014 14:42
>>>> To: openstack-dev at lists.openstack.org
>>>> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is
>> "Success"
>>>> exactly?
>>>>
>>>> On 06/30/2014 09:13 PM, Jay Pipes wrote:
>>>>> On 06/30/2014 07:08 PM, Anita Kuno wrote:
>>>>>> On 06/30/2014 04:22 PM, Jay Pipes wrote:
>>>>>>> Hi Stackers,
>>>>>>>
>>>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought
>>>>>>> up some legitimate questions around how a newly-proposed
>>>>>>> Stackalytics report page for Neutron External CI systems [2]
>>>>>>> represented the results of an external CI system as "successful"
>>>>>>> or
>>>> not.
>>>>>>>
>>>>>>> First, I want to say that Ilya and all those involved in the
>>>>>>> Stackalytics program simply want to provide the most accurate
>>>>>>> information to developers in a format that is easily consumed.
>>>>>>> While there need to be some changes in how data is shown (and the
>>>>>>> wording of things like "Tests Succeeded"), I hope that the
>>>>>>> community knows there isn't any ill intent on the part of Mirantis
>>>>>>> or anyone who works on Stackalytics. OK, so let's keep the
>>>>>>> conversation civil -- we're all working towards the same goals of
>>>>>>> transparency and accuracy. :)
>>>>>>>
>>>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant
>>>>>>> question:
>>>>>>>
>>>>>>> "But what does CI tested really mean? just running tests? or
>>>>>>> tested to pass some level of requirements?"
>>>>>>>
>>>>>>> In this nascent world of external CI systems, we have a set of
>>>>>>> issues that we need to resolve:
>>>>>>>
>>>>>>> 1) All of the CI systems are different.
>>>>>>>
>>>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
>>>>>>> scripts. Others run custom Python code that spawns VMs and
>>>>>>> publishes logs to some public domain.
>>>>>>>
>>>>>>> As a community, we need to decide whether it is worth putting in
>>>>>>> the effort to create a single, unified, installable and runnable
>>>>>>> CI system, so that we can legitimately say "all of the external
>>>>>>> systems are identical, with the exception of the driver code for
>>>>>>> vendor X being substituted in the Neutron codebase."
>>>>>>>
>>>>>>> If the goal of the external CI systems is to produce reliable,
>>>>>>> consistent results, I feel the answer to the above is "yes", but
>>>>>>> I'm interested to hear what others think. Frankly, in the world of
>>>>>>> benchmarks, it would be unthinkable to say "go ahead and everyone
>>>>>>> run your own benchmark suite", because you would get wildly
>>>>>>> different results. A similar problem has emerged here.
>>>>>>>
>>>>>>> 2) There is no mediation or verification that the external CI
>>>>>>> system is actually testing anything at all
>>>>>>>
>>>>>>> As a community, we need to decide whether the current system of
>>>>>>> self-policing should continue. If it should, then language on
>>>>>>> reports like [3] should be very clear that any numbers derived
>>>>>>> from such systems should be taken with a grain of salt. Use of the
>>>>>>> word "Success" should be avoided, as it has connotations (in
>>>>>>> English, at
>>>>>>> least) that the result has been verified, which is simply not the
>>>>>>> case as long as no verification or mediation occurs for any
>>>>>>> external
>>>> CI system.
>>>>>>>
>>>>>>> 3) There is no clear indication of what tests are being run, and
>>>>>>> therefore there is no clear indication of what "success" is
>>>>>>>
>>>>>>> I think we can all agree that a test has three possible outcomes:
>>>>>>> pass, fail, and skip. The results of a test suite run therefore is
>>>>>>> nothing more than the aggregation of which tests passed, which
>>>>>>> failed, and which were skipped.
>>>>>>>
>>>>>>> As a community, we must document, for each project, what are
>>>>>>> expected set of tests that must be run for each merged patch into
>>>>>>> the project's source tree. This documentation should be
>>>>>>> discoverable so that reports like [3] can be crystal-clear on what
>>>>>>> the data shown actually means. The report is simply displaying the
>>>>>>> data it receives from Gerrit. The community needs to be proactive
>>>>>>> in saying "this is what is expected to be tested." This alone
>>>>>>> would allow the report to give information such as "External CI
>>>>>>> system ABC performed the
>>>> expected tests. X tests passed.
>>>>>>> Y tests failed. Z tests were skipped." Likewise, it would also
>>>>>>> make it possible for the report to give information such as
>>>>>>> "External CI system DEF did not perform the expected tests.",
>>>>>>> which is excellent information in and of itself.
>>>>>>>
>>>>>>> ===
>>>>>>>
>>>>>>> In thinking about the likely answers to the above questions, I
>>>>>>> believe it would be prudent to change the Stackalytics report in
>>>>>>> question [3] in the following ways:
>>>>>>>
>>>>>>> a. Change the "Success %" column header to "% Reported +1 Votes"
>>>>>>> b. Change the phrase " Green cell - tests ran successfully, red
>>>>>>> cell
>>>>>>> - tests failed" to "Green cell - System voted +1, red cell -
>>>>>>> System voted -1"
>>>>>>>
>>>>>>> and then, when we have more and better data (for example, # tests
>>>>>>> passed, failed, skipped, etc), we can provide more detailed
>>>>>>> information than just "reported +1" or not.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Best,
>>>>>>> -jay
>>>>>>>
>>>>>>> [1]
>>>>>>> http://lists.openstack.org/pipermail/openstack-dev/2014-
>> June/038933.
>>>>>>> html
>>>>>>> [2]
>>>>>>> http://eavesdrop.openstack.org/meetings/third_party/2014/third_par
>>>>>>> ty
>>>>>>> .2014-06-30-18.01.log.html
>>>>>>>
>>>>>>>
>>>>>>> [3] http://stackalytics.com/report/ci/neutron/7
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>> Hi Jay:
>>>>>>
>>>>>> Thanks for starting this thread. You raise some interesting
>>>> questions.
>>>>>>
>>>>>> The question I had identified as needing definition is "what
>>>>>> algorithm do we use to assess fitness of a third party ci system".
>>>>>>
>>>>>> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openst
>>>>>> ac
>>>>>> k-infra.2014-06-30.log
>>>>>>
>>>>>> timestamp 2014-06-30T19:23:40
>>>>>>
>>>>>> This is the question that is top of mind for me.
>>>>>
>>>>> Right, my email above is written to say "unless there is a)
>>>>> uniformity of the external CI system, b) agreement on mediation or
>>>>> verification of said systems, and c) agreement on what tests shall
>>>>> be expected to pass and be skipped for each project, then no such
>>>>> algorithm is really possible."
>>>>>
>>>>> Now, if the community is willing to agree to a), b), and c), then
>>>>> certainly there is the ability to determine the fitness of a CI
>>>>> system
>>>>> -- at least in regards to its output (test results and the voting on
>>>>> the Gerrit system).
>>>>>
>>>>> Barring agreement on any or all of those three things, I recommended
>>>>> changing the language on the report due to the inability to have any
>>>>> consistently-applied algorithm to determine fitness.
>>>>>
>>>>> Best,
>>>>> -jay
>>>>>
>>>
>>> +1 to all of your points above, Jay.  Well-written, thank you.
>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> I've been mulling this over and looking at how I assess feedback I
>>>> get from different human reviewers, since I don't know the basis of
>>>> how they arrive at their decisions unless they tell me and/or I have
>>>> experience with their criteria for how they review my patches.
>>>>
>>>> I get different value from different human reviewers based upon my
>>>> experience of them reviewing my patches, my experience of them
>>>> reviewing other people's patches, my experience reviewing their code
>>>> and my discussions with them in channel, on the mailing list and in
>>>> person, as well as my experience reading or becoming aware of other
>>>> decisions they make.
>>>>
>>>> It would be really valuable for me personally to have a page in
>>>> gerrit for each third party ci account, where I could sign in and
>>>> leave comments or vote +/-1 or 0 as a way of giving feedback to the
>>>> maintainers of that system. Also others could do the same and I could
>>>> read their feedback. For instance, yesterday someone linked me to
>>>> logs that forced me to download them to read. I hadn't been made
>>>> aware this account had been doing this, but this developer was aware.
>>>> Currently we have no system for a developer, in the course of their
>>>> normal workflow, to leave a comment and/or vote on a third party ci
>>>> system to give those maintainers feedback about how they are doing at
>>>> providing consumable artifacts from their system.
>>>>
>>>> It also would remove the perception that I'm just a big meany, since
>>>> developers could comment for themselves, directly on the account, how
>>>> they feel about having to download tarballs, or sign into other
>>>> systems to trigger a recheck. The community of developers would say
>>>> how fit a system is or isn't since they are the individuals having to
>>>> dig through logs and evaluate "did this build fail because the code
>>>> needs adjustment" or not, and can reflect their findings in a comment
>>>> and vote on the system.
>>>>
>>>> The other thing I really value about gerrit is that votes can change,
>>>> systems can improve, given motivation and accurate feedback for
>>>> making changes.
>>>>
>>>> I have no idea how hard this would be to create, but I think having
>>>> direct feedback from developers on systems would help both the
>>>> developers and the maintainers of ci systems.
>>>>
>>>> There are a number of people working really hard to do a good job in
>>>> this area. This sort of structure would also provide support and
>>>> encouragement to those people providing leadership in this space,
>>>> people asking good questions, helping other system maintainers,
>>>> starting discussions, offering patches to infra (and reviewing infra
>>>> patches) in accordance with the goals of the third party meeting[0]
>>>> and other hard- to-measure valuable decisions that provide value for
>> the community.
>>>> I'd really like a way we all can demonstrate the extent to which we
>>>> value these contributions.
>>>>
>>>> So far, those are my thoughts.
>>>>
>>>> Thanks,
>>>> Anita.
>>>
>>> +1 - this sounds like a really good idea.
>>>
>>> How is feedback on the Openstack check/gate retrieved and moderated?
>> Can that provide a model for doing what you suggest here?
>> Hi Jon Paul: (Is it Jon Paul or Jon?)
> 
> Hi Anita - it's Jon-Paul or JP.
> 
>>
>> The OpenStack check/gate pipelines are assessed using a system we call
>> elastic recheck: http://status.openstack.org/elastic-recheck/
>>
>> We use logstash to index log output and elastic search to then be able
>> to compose queries to evaluate the number of incidence of a given error
>> message (for example). Sample query:
>> http://git.openstack.org/cgit/openstack-infra/elastic-
>> recheck/tree/queries/1097592.yaml
>>
>> The elastic-recheck repo is here:
>> http://git.openstack.org/cgit/openstack-infra/elastic-recheck/
>>
>> The gui is available at logstash.openstack.org.
>>
>> All the queries are written manually as yaml files and named with a
>> corresponding bug number:
>> http://git.openstack.org/cgit/openstack-infra/elastic-
>> recheck/tree/queries
>>
>> Here is some documentation about elastic-recheck and how to write
>> queries: http://docs.openstack.org/infra/elastic-recheck/readme.html
>>
>> Joe Gordon actually has created some great graphs (where are those
>> hosted again, Joe?) to be able to evaluate failure rates in the
>> pipelines (check and gate) based on test groups (tempest, unit tests).
>>
>> Clark Boylan and Sean Dague did and do the majority of the heavy lifting
>> setting up and maintaining elastic-recheck (with lots of help from
>> others, thank you!) so perhaps they could offer their opinion on if this
>> is a reasonable choice for evaluating third party ci systems.
> 
> And this is part of the puzzle - collection of statistics, as I think stackalytics was looking to do.
Yes it is, and they do a good job of presenting them, the pages are very
pretty which is why they are so popular. Once they implement accurate
labels on their collections of statistics, that will be great and it
sounds like we are heading in that direction.

> 
> But there is a second side to what you were saying which was the developer feedback.  I guess I am suggesting that if you are putting a system in place for developers to vote on the 3rd party CI, should that same system be in effect for the Openstack check/gate jobs?
> 
It already is, it is called #openstack-infra. All day long (the 24 hour
day) developers drop in and tell us exactly how they feel about any
aspect of OpenStack Infrastructure. They let us know when documentation
is confusing, when things are broken, when a patch should have been
merged and failed to be, when Zuul is caught in a retest loop and
occasionally when we get something right.

OpenStack Infra logs can be found here:
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/

I don't think having an irc channel for third party is practical because
it simply will split infra resources and I have my doubts about how
responsive folks would be in it. Hence my suggestion of the pages to
allow developers to share the kind of information they share in
openstack-infra all the time.

I still don't know if it is Jon Paul or Jon, so I'll go with Jon Paul
and ask your forgiveness if I am incorrect.

Thanks Jon Paul,
Anita.
>>
>> Thanks Jon Paul, this is a good question, Anita.
>>>
>>>>
>>>>
>>>> [0]
>>>> https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_P
>>>> art
>>>> y_meetings
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> Thanks, 
> Jon-Paul Sullivan ☺ Cloud Services - @hpcloud
> 
> Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park, Galway.
> Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John Rogerson's Quay, Dublin 2. 
> Registered Number: 361933
>  
> The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender.
> 
> To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list