[OpenStack-Infra] Lost data in elastic search

Sean Dague sean at dague.net
Wed Jan 8 15:49:09 UTC 2014


In trying to compute Elastic Recheck failure rates we basically need to 
do the following:

a = get_baseline_all_jobs - """filename:"console.html" AND 
(message:"Finished: FAILURE" OR message:"Finished: SUCCESS")"""

b = get_results_for_er_querie
a = a.groupby('build_uuid')
b = b.groupby('build_uuid')

a.join(b, on='build_uuid')

In doing so I started running into issues that I was getting far fewer 
failures after the join than I expected.

And what I discovered was that console.html was completely missing from 
the indexes for a some build_uuids.

Here is a good example: 
http://logstash.openstack.org/#eyJzZWFyY2giOiJidWlsZF91dWlkOjIxMGI3N2UzZmFhMTQ1ZGQ4ZTE0ZjNhODNiOTdmOTIyIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg5MTk1MjMzMTY5fQ==

build_uuid:210b77e3faa145dd8e14f3a83b97f922

I'm not sure what the fix is there. We could probably write an audit 
tool to figure out how bad it is.

This would also actually explain something I've noticed where I would 
have expected ER to report on a bug, but it did not, because ER actually 
waits for all expected files to land for a job (console.html is a 
required file) before it reports.

	-Sean

-- 
Sean Dague
Samsung Research America
sean at dague.net / sean.dague at samsung.com
http://dague.net



More information about the OpenStack-Infra mailing list