<div dir="ltr">Everything sounds good!<br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jan 6, 2014 at 6:52 PM, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 01/06/2014 07:04 PM, Joe Gordon wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

Overall this looks really good, and very spot on.<br>

<br>

<br>

On Thu, Jan 2, 2014 at 6:29 PM, Sean Dague <<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a><br></div><div><div class="h5">

<mailto:<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>>> wrote:<br>

<br>

    A lot of elastic recheck this fall has been based on the ad hoc<br>

    needs of the moment, in between diving down into the race bugs that<br>

    were uncovered by it. This week away from it all helped provide a<br>

    little perspective on what I think we need to do to call it *done*<br>

    (i.e. something akin to a 1.0 even though we are CDing it).<br>

<br>

    Here is my current thinking on the next major things that should<br>

    happen. Opinions welcomed.<br>

<br>

    (These are roughly in implementation order based on urgency)<br>

<br>

    = Split of web UI =<br>

<br>

    The elastic recheck page is becoming a mismash of what was needed at<br>

    the time. I think what we really have emerging is:<br>

      * Overall Gate Health<br>

      * Known (to ER) Bugs<br>

      * Unknown (to ER) Bugs - more below<br>

<br>

    I think the landing page should be Know Bugs, as that's where we<br>

    want both bug hunters to go to prioritize things, as well as where<br>

    people looking for known bugs should start.<br>

<br>

    I think the overall Gate Health graphs should move to the zuul<br>

    status page. Possibly as part of the collection of graphs at the bottom.<br>

<br>

    We should have a secondary page (maybe column?) of the<br>

    un-fingerprinted recheck bugs, largely to use as candidates for<br>

    fingerprinting. This will let us eventually take over /recheck.<br>

<br>

<br>

I think it would be cool to collect the list of unclassified failures<br>

(not by recheck bug), so we can see how many (and what percentage) need<br>

to be classified. This isn't gate health but more of e-r health or<br>

something like that.<br>

</div></div></blockquote>

<br>

Agreed. I've got the percentage in check_success today, but I agree that every gate job that fails that we don't have a fingerprint should be listed somewhere we can work through them.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

<br>

    = Data Analysis / Graphs =<br>

<br>

    I spent a bunch of time playing with pandas over break<br></div>

    (<a href="http://dague.net/2013/12/30/__ipython-notebook-experiments/" target="_blank">http://dague.net/2013/12/30/_<u></u>_ipython-notebook-experiments/</a><br>

    <<a href="http://dague.net/2013/12/30/ipython-notebook-experiments/" target="_blank">http://dague.net/2013/12/30/<u></u>ipython-notebook-experiments/</a>><u></u>)__, it's<div class="im"><br>

    kind of awesome. It also made me rethink our approach to handling<br>

    the data.<br>

<br>

    I think the rolling average approach we were taking is more precise<br>

    than accurate. As these are statistical events they really need<br>

    error bars. Because when we have a quiet night, and 1 job fails at<br>

    6am in the morning, the 100% failure rate it reflects in grenade<br>

    needs to be quantified that it was 1 of 1, not 50 of 50.<br>

<br>

<br>

    So my feeling is we should move away from the point graphs we have,<br>

    and present these as weekly and daily failure rates (with graphs and<br>

    error bars). And slice those per job. My suggestion is that we do<br>

    the actual visualization with matplotlib because it's super easy to<br>

    output that from pandas data sets.<br>

<br>

<br>

The one thing that the current graph does, that weekly and daily failure<br>

rates don't show, is a sudden spike in one of the lines.  If you stare<br>

at the current graphs for long enough and can read through the noise,<br>

you can see when the gate collectively crashes or if just the neutron<br>

related gates start failing. So I think one more graph is needed.<br>

</div></blockquote>

<br>

The point of the visualizations is to make sense to people that don't understand all the data, especially core members of various teams that are trying to figure out "if I attack 1 bug right now, what's the biggest bang for my buck."<div class="im">


<br></div></blockquote><div><br></div><div>Yes, that is one of the big uses for a visualization.  the one I had in mind was being able to see if a new unclassified bug appeared.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

    Basically we'll be mining Elastic Search -> Pandas TimeSeries -><br>

    transforms and analysis -> output tables and graphs. This is<br>

    different enough from our current jquery graphing that I want to get<br>

    ACKs before doing a bunch of work here and finding out people don't<br>

    like it in reviews.<br>

<br>

    Also in this process upgrade the metadata that we provide for each<br>

    of those bugs so it's a little more clear what you are looking at.<br>

<br>

<br>

For example?<br>

</blockquote>

<br></div>

We should always be listing the bug title, not just the number. We should also list what projects it's filed against. I've stared at this bugs as much as anyone, and I still need to click through the top 4 to figure out which one is the ssh bug. :)<div>


<div class="h5"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

    = Take over of /recheck =<br>

<br>

    There is still a bunch of useful data coming in on "recheck bug<br>

    ####" data which hasn't been curated into ER queries. I think the<br>

    right thing to do is treat these as a work queue of bugs we should<br>

    be building patterns out of (or completely invalidating). I've got a<br>

    preliminary gerrit bulk query piece of code that does this, which<br>

    would remove the need of the daemon the way that's currently<br>

    happening. The gerrit queries are a little long right now, but I<br>

    think if we are only doing this on hourly cron, the additional load<br>

    will be negligible.<br>

<br>

    This would get us into a single view, which I think would be more<br>

    informative than the one we currently have.<br>

<br>

<br>

treating /recheck as a work queue sounds great, but this needs a bit<br>

more fleshing out I think.<br>

<br>

I imagine the workflow as something like this:<br>

<br>

* State 1: Path author files bug saying 'gate broke, I didn't do it and<br>

don't know why it broke'.<br>

* State 2: Someone investigates the bug and determines if bug is valid<br>

and if its a duplicate or not. root cause still isn't known.<br>

* State 3: Someone writes a fingerprint for this bug and commits it to<br>

elastic-recheck.<br>

<br>

Assuming we agree on this general workflow, it would be nice if /recheck<br>

distinguished between bugs in states 1 and 2, and there is no need to<br>

list bugs in state 3 as e-r bot will automatically tell a developer when<br>

he hits it.<br>

</blockquote>

<br></div></div>

Sure, that means policy on something in the bugs that can distinguish between. I assume LP states.<br>

<br>

State 1 = new & invalid?<br>

State 2 = confirmed / triaged?<br>

<br>

I think we can call that post 1.0 though, as we'll be adding details beyond anything we have today.</blockquote><div><br></div><div>Yup, this sounds like post 1.0 to me too. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

    = Categorize all the jobs =<br>

<br>

    We need a bit of refactoring to let us comment on all the jobs (not<br>

    just tempest ones). Basically we assumed pep8 and docs don't fail in<br>

    the gate at the beginning. Turns out they do, and are good<br>

    indicators of infra / external factor bugs. They are a part of the<br>

    story so we should put them in.<br>

<br>

<br>

Don't forget grenade<br>

</blockquote>

<br></div>

Yep. That's part of all. :) I was just calling out the others as something not originally on the list.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

    = Multi Line Fingerprints =<br>

<br>

    We've definitely found bugs where we never had a really satisfying<br>

    single line match, but we had some great matches if we could do<br>

    multi line.<br>

<br>

    We could do that in ER, however it will mean giving up logstash as<br>

    our UI, because those queries can't be done in logstash. So in order<br>

    to do this we'll really need to implement some tools - cli minimum,<br>

    which will let us easily test a bug. A custom web UI might be in<br>

    order as well, though that's going to be it's own chunk of work,<br>

    that we'll need more volunteers for.<br>

<br>

    This would put us in a place where we should have all the<br>

    infrastructure to track 90% of the race conditions, and talk about<br>

    them in certainty as 1%, 5%, 0.1% bugs.<br>

<br>

<br>

<br>

Horrah. multi line matches are two separate elasticSearch queries, where<br>

you match build_uuids.  So to get the set of all hits of a multi line<br>

fingerprint you find the intersection between line_1 and line_2 where<br>

the key is build_uuid<br>

</blockquote>

<br></div>

Yes. The biggest issue is tooling for making it easy for people to test their queries. It's pretty unfriendly to tell people to do manual correlation in ES.<div class="HOEnZb"><div class="h5"><br>

<br>

        -Sean<br>

<br>

-- <br>

Sean Dague<br>

Samsung Research America<br>

<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a> / <a href="mailto:sean.dague@samsung.com" target="_blank">sean.dague@samsung.com</a><br>

<a href="http://dague.net" target="_blank">http://dague.net</a><br>

<br>

______________________________<u></u>_________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.<u></u>org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>