In trying to compute Elastic Recheck failure rates we basically need to do the following: a = get_baseline_all_jobs - """filename:"console.html" AND (message:"Finished: FAILURE" OR message:"Finished: SUCCESS")""" b = get_results_for_er_querie a = a.groupby('build_uuid') b = b.groupby('build_uuid') a.join(b, on='build_uuid') In doing so I started running into issues that I was getting far fewer failures after the join than I expected. And what I discovered was that console.html was completely missing from the indexes for a some build_uuids. Here is a good example: http://logstash.openstack.org/#eyJzZWFyY2giOiJidWlsZF91dWlkOjIxMGI3N2UzZmFhMTQ1ZGQ4ZTE0ZjNhODNiOTdmOTIyIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg5MTk1MjMzMTY5fQ== build_uuid:210b77e3faa145dd8e14f3a83b97f922 I'm not sure what the fix is there. We could probably write an audit tool to figure out how bad it is. This would also actually explain something I've noticed where I would have expected ER to report on a bug, but it did not, because ER actually waits for all expected files to land for a job (console.html is a required file) before it reports. -Sean -- Sean Dague Samsung Research America sean at dague.net / sean.dague at samsung.com http://dague.net