[OpenStack-Infra] RFC about creating a project for log-classify.crm

Tristan Cacqueray tdecacqu at redhat.com
Mon Nov 27 05:45:11 UTC 2017


On November 24, 2017 10:41 am, Klérisson Paixão wrote:
>> Speaking of which, I think it's important to curate a dataset of
>> success/failure logs with the expected anomalies to be found. Those will
>> be super useful to prevent regression when trying out new settings or
> models.
>> How to store and manage the dataset remains to be defined too.
>> To give you an idea, fwiw, you can find my original dataset here:
>>  git clone https://softwarefactory-project.io/r/logreduce-tests
>>
> How did you collect and curate the original dataset?
> And, how do you expect the new set looks like?
> 
> Cheers,
> Klérisson
> 
This dataset has been manually created, mostly out of failed jobs from
openstack-infra ci. I tried to pick logs with unusual formats and I
just referenced the expected anomalies to be found in the inf.yaml files.

Perhaps we could annotate the error_pr score out of the current
log-classify.crm, at least for the obvious anomalies. The dataset
attribute would be a list of (error_pr, log-line) tuples.

Though, instead of looking for high error_pr score, we might want to only
report the error_pr scores that highly deviate from the mean, in which
case we should better store (deviation, log-line).

Regards,
-Tristan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20171127/28fc2df3/attachment.sig>


More information about the OpenStack-Infra mailing list