Open Stack

Tue Aug 27 20:04:07 UTC 2013

Indeed, sorry for the distraction!

Alex

On Tue, Aug 27, 2013 at 11:23 AM, John Griffith <john.griffith at solidfire.com
> wrote:

>
>
>
> On Tue, Aug 27, 2013 at 11:47 AM, Clark Boylan <clark.boylan at gmail.com>wrote:
>
>> On Tue, Aug 27, 2013 at 10:15 AM, Clint Byrum <clint at fewbar.com> wrote:
>> > Excerpts from John Griffith's message of 2013-08-27 09:42:37 -0700:
>> >> On Tue, Aug 27, 2013 at 10:26 AM, Alex Gaynor <alex.gaynor at gmail.com>
>> wrote:
>> >>
>> >> > I wonder if there's any sort of automation we can apply to this, for
>> >> > example having known rechecks have "signatures" and if a failure
>> matches
>> >> > the signature it auto applies the recheck.
>> >> >
>> >>
>> >> I think we kinda already have that, the recheck list and the bug ID
>> >> assigned to it no?  Automatically scanning said list and doing the
>> recheck
>> >> automatically seems like overkill in my opinion.  At some point human
>> >> though/interaction is required and I don't think it's too much to ask a
>> >> technical contributor to simply LOOK at the output from the test runs
>> >> against their patches and help out a bit. At the very least if you
>> didn't
>> >> test your patch yourself and waited for Jenkins to tell you it's
>> broken I
>> >> would hope that a submitter would at least be motivated to fix their
>> own
>> >> issue that they introduced.
>> >>
>> >
>> > It is worth thinking about though, because "ask a technical contributor
>> > to simply LOOK" is a lot more expensive than "let a script confirm the
>> > failure and tack it onto the list for rechecks".
>> >
>> > Ubuntu has something like this going for all of their users and it is
>> > pretty impressive.
>> >
>> > Apport and/or whoopsie see crashes and look at the
>> > backtraces/coredumps/etc and then (with user permission) submit a
>> > signature to the backend. It is then analyzed and the result is this:
>> >
>> > http://errors.ubuntu.com/
>> >
>> > Known false positives are shipped along side packages so that they do
>> > not produce noise, and known points of pain for debugging are eased by
>> > including logs and other things in bug reports when users are running
>> > the dev release. This results in a much better metric for what bugs to
>> > address first. IIRC update-manager also checks in with a URL that is
>> > informed partially by this data about whether or not to update packages,
>> > so if there is a high fail rate early on, the server side will basically
>> > signal update-manager "don't update right now".
>> >
>> > I'd love to see our CI system enhanced to do all of the pattern
>> > matching to group failures by common patterns, and then when a technical
>> > contributor looks at these groups they have tons of data points to _fix_
>> > the problem rather than just spending their precious time identifying
>> it.
>> >
>> > The point of the recheck system, IMHO, isn't to make running rechecks
>> > easier, it is to find and fix bugs.
>> >
>> This is definitely worth thinking about and we had a session on
>> dealing with CI logs to do interesting things like update bugs and
>> handle rechecks automatically at the Havana summit[0]. Since then we
>> have built a logstash + elasticsearch system[1] that filters many of
>> our test logs and indexes a subset of what was filtered (typically
>> anything with a log level greater than DEBUG). Building this system is
>> step one in being able to detect anomalous logs, update bugs, and
>> potentially perform automatic rechecks with the appropriate bug.
>> Progress has been somewhat slow, but the current setup should be
>> mostly stable. If anyone is interested in poking at these tools to do
>> interesting automation with them feel free to bug the Infra team.
>>
>> That said, we won't have something super automagic like that before
>> the end of Havana making John's point an important one. If previous
>> release feature freezes are any indication we will continue to put
>> more pressure on the CI system as we near Havana's feature freeze. Any
>> unneeded rechecks or reverifies can potentially slow the whole process
>> down for everyone. We should be running as many tests as possible
>> locally before pushing to Gerrit (this is as simple as running `tox`)
>> and making a best effort to identify the bugs that cause failures when
>> performing rechecks or reverifies.
>>
>> [0] https://etherpad.openstack.org/havana-ci-logging
>> [1] http://ci.openstack.org/logstash.html
>>
>> Thank you,
>> Clark
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> The automation ideas are great, no argument there didn't mean to imply
> they weren't or discount them.  Just don't want the intent of the message
> to get lost in all the things we "could" do going forward.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130827/dcfc574c/attachment.html>

Open Stack

[openstack-dev] [OpenStack-dev] Rechecks and Reverifies

OpenStack

Community

Documentation

Branding & Legal