[openstack-dev] [elastic-recheck] Thoughts on next steps
Sean Dague
sean at dague.net
Wed Jan 8 00:14:14 UTC 2014
On 01/07/2014 06:44 PM, Matt Riedemann wrote:
>
>
> On 1/7/2014 5:26 PM, Sean Dague wrote:
>> On 01/07/2014 06:20 PM, Matt Riedemann wrote:
>>>
>>>
>>> On 1/2/2014 8:29 PM, Sean Dague wrote:
>>>> A lot of elastic recheck this fall has been based on the ad hoc
>>>> needs of
>>>> the moment, in between diving down into the race bugs that were
>>>> uncovered by it. This week away from it all helped provide a little
>>>> perspective on what I think we need to do to call it *done* (i.e.
>>>> something akin to a 1.0 even though we are CDing it).
>>>>
>>>> Here is my current thinking on the next major things that should
>>>> happen.
>>>> Opinions welcomed.
>>>>
>>>> (These are roughly in implementation order based on urgency)
>>>>
>>>> = Split of web UI =
>>>>
>>>> The elastic recheck page is becoming a mismash of what was needed at
>>>> the
>>>> time. I think what we really have emerging is:
>>>> * Overall Gate Health
>>>> * Known (to ER) Bugs
>>>> * Unknown (to ER) Bugs - more below
>>>>
>>>> I think the landing page should be Know Bugs, as that's where we want
>>>> both bug hunters to go to prioritize things, as well as where people
>>>> looking for known bugs should start.
>>>>
>>>> I think the overall Gate Health graphs should move to the zuul status
>>>> page. Possibly as part of the collection of graphs at the bottom.
>>>>
>>>> We should have a secondary page (maybe column?) of the un-fingerprinted
>>>> recheck bugs, largely to use as candidates for fingerprinting. This
>>>> will
>>>> let us eventually take over /recheck.
>>>>
>>>> = Data Analysis / Graphs =
>>>>
>>>> I spent a bunch of time playing with pandas over break
>>>> (http://dague.net/2013/12/30/ipython-notebook-experiments/), it's kind
>>>> of awesome. It also made me rethink our approach to handling the data.
>>>>
>>>> I think the rolling average approach we were taking is more precise
>>>> than
>>>> accurate. As these are statistical events they really need error bars.
>>>> Because when we have a quiet night, and 1 job fails at 6am in the
>>>> morning, the 100% failure rate it reflects in grenade needs to be
>>>> quantified that it was 1 of 1, not 50 of 50.
>>>>
>>>> So my feeling is we should move away from the point graphs we have, and
>>>> present these as weekly and daily failure rates (with graphs and error
>>>> bars). And slice those per job. My suggestion is that we do the actual
>>>> visualization with matplotlib because it's super easy to output that
>>>> from pandas data sets.
>>>>
>>>> Basically we'll be mining Elastic Search -> Pandas TimeSeries ->
>>>> transforms and analysis -> output tables and graphs. This is different
>>>> enough from our current jquery graphing that I want to get ACKs before
>>>> doing a bunch of work here and finding out people don't like it in
>>>> reviews.
>>>>
>>>> Also in this process upgrade the metadata that we provide for each of
>>>> those bugs so it's a little more clear what you are looking at.
>>>>
>>>> = Take over of /recheck =
>>>>
>>>> There is still a bunch of useful data coming in on "recheck bug ####"
>>>> data which hasn't been curated into ER queries. I think the right thing
>>>> to do is treat these as a work queue of bugs we should be building
>>>> patterns out of (or completely invalidating). I've got a preliminary
>>>> gerrit bulk query piece of code that does this, which would remove the
>>>> need of the daemon the way that's currently happening. The gerrit
>>>> queries are a little long right now, but I think if we are only doing
>>>> this on hourly cron, the additional load will be negligible.
>>>>
>>>> This would get us into a single view, which I think would be more
>>>> informative than the one we currently have.
>>>>
>>>> = Categorize all the jobs =
>>>>
>>>> We need a bit of refactoring to let us comment on all the jobs (not
>>>> just
>>>> tempest ones). Basically we assumed pep8 and docs don't fail in the
>>>> gate
>>>> at the beginning. Turns out they do, and are good indicators of infra /
>>>> external factor bugs. They are a part of the story so we should put
>>>> them
>>>> in.
>>>>
>>>> = Multi Line Fingerprints =
>>>>
>>>> We've definitely found bugs where we never had a really satisfying
>>>> single line match, but we had some great matches if we could do multi
>>>> line.
>>>>
>>>> We could do that in ER, however it will mean giving up logstash as our
>>>> UI, because those queries can't be done in logstash. So in order to do
>>>> this we'll really need to implement some tools - cli minimum, which
>>>> will
>>>> let us easily test a bug. A custom web UI might be in order as well,
>>>> though that's going to be it's own chunk of work, that we'll need more
>>>> volunteers for.
>>>>
>>>> This would put us in a place where we should have all the
>>>> infrastructure
>>>> to track 90% of the race conditions, and talk about them in
>>>> certainty as
>>>> 1%, 5%, 0.1% bugs.
>>>>
>>>> -Sean
>>>>
>>>
>>> Let's add regexp query support to elastic-recheck so that I could have
>>> fixed this better:
>>>
>>> https://review.openstack.org/#/c/65303/
>>>
>>> Then I could have just filtered the build_name with this:
>>>
>>> build_name:/(check|gate)-(tempest|grenade)-[a-z\-]+/
>>
>> If you want to extend the query files with:
>>
>> regex:
>> - build_name: /(check|gate)-(tempest|grenade)-[a-z\-]+/
>> - some_other_field: /some other regex/
>>
>> And make it work with the query builder, I think we should consider it.
>> It would be good to know how much more expensive those queries get
>> though, because our ES is under decent load as it is.
>>
>> -Sean
>>
>>
>>
>
> Yeah, alternatively we could turn on wildcard support in the
> query_string capability but the docs warn against that for performance
> reasons (which you can negate a bit with allow_leading_wildcard=false).
>
> I'm not sure how to figure out how much more expensive those queries get
> though to see if they are really a limiting factor for us supporting
> them? Ideas on that?
Honestly, the recommended node size for ES nodes is 64G (which is twice
what we have) so I'd nix the idea of wildcards entirely, as anything
they think is going to be a perf hit is really no good (and we already
regularly get timeouts on queries going > 7 days back when we hit a cold
cache).
I'd suggest adding timing debug to check_success.py before trying regex.
As it would be good to know how much more expensive regex match was over
enumerating all the strings.
-Sean
--
Sean Dague
Samsung Research America
sean at dague.net / sean.dague at samsung.com
http://dague.net
More information about the OpenStack-dev
mailing list