[OpenStack-Infra] RFC about creating a project for log-classify.crm

Tristan Cacqueray tdecacqu at redhat.com
Fri Nov 24 06:55:58 UTC 2017


On November 23, 2017 10:04 am, Shuquan Huang wrote:
> I think standard openstack python project is fine.  Before the project creation, anything I can help with now? (
> 

It seems like the first task is to design a stable interface between
log-gearman-client and log-classify. To be honest, I haven't fully grasp
the current crm script yet, so I'm not sure what needs to be covered.

For example, is this change https://review.openstack.org/522399 should
be part of the library?

If not, then the interface would likely be something like:
  logclassify.logstash.process(logstash_event)
Otherwise, I think it's fair to well define the log-gearman-worker
requirements and build a filter interface like so:
  filter = logclassify.logstash.Filter(filename, ...)
  for line in log_lines:
    error_pr = filter.process_line(line)


The library interface should at least address:
1/ Usage outside of the log-gearman-client context,
   such as integration test and simple command line
2/ Being able to swap out the crm114 implementation with other models

Then we can define the internal logic of the crm114 model so that
a generic model class could be designed within the logclassify library.
With that in place, we'll be able to run integration test to verify
the library correctly detect annomalies.


Speaking of which, I think it's important to curate a dataset of
success/failure logs with the expected anomalies to be found. Those will
be super useful to prevent regression when trying out new settings or models.
How to store and manage the dataset remains to be defined too.
To give you an idea, fwiw, you can find my original dataset here:
  git clone https://softwarefactory-project.io/r/logreduce-tests

Cheers,
-Tristan
> 
> On 22/11/2017, 4:30 PM, "Tristan Cacqueray" <openstack-infra-bounces at lists.openstack.org on behalf of tdecacqu at redhat.com> wrote:
> 
>     On November 21, 2017 5:48 pm, Clark Boylan wrote:
>     > On Tue, Nov 21, 2017, at 09:17 AM, Tristan Cacqueray wrote:
>     >>
>     > 
>     > snip
>     > 
>     >> Actually the rfc is this thread :-)
>     >> 
>     >> Though I forgot to mention the first steps that could use comments before
>     >> we move on:
>     >> * create the openstack-infra/log-classify project,
>     >> * import the log-classify.crm script,
>     >> * wrap the script with a more user friendly interface, and
>     >> * modify the puppet-log_processor to use that new project instead
>     > 
>     > This sounds like a great place to start. Considering the interest
>     > already forming around this I would say go ahead and create the project
>     > and start with the import process so that people have a concrete place
>     > to start working on this. I am sure it will evolve from there, but
>     > getting started is often the most difficult step.
>     > 
>     > Related to the last step we have temporarily disabled CRM classification
>     > in the log processor pipeline because we treat the whole file path as a
>     > unique file to classify which ended up filling our workers' disks with
>     > classification files. I think one of the things we will want to address
>     > early on is using the basename rather than the whole path to
>     > significantly reduce the total number of data files on disk. This way we
>     > can get it running in the log processor pipeline again for proper
>     > production feedback of changes that are happening.
>     > 
>     > Once again let me know if I can help with anything (happy to review new
>     > project creation changes for example).
>     > 
>     
>     Excellent, project creation is proposed here:
>       https://review.openstack.org/#/q/topic:log-classify
>     
>     I'm open to suggestion regarding the name and structure of the project.
>     Otherwise I'll create a standard openstack python project with:
>     
>     logclassify.logstash module to interface with the script using the
>     design of the log-gearman-client.py (e.g. a process(event)).
>     logclassify.cmd module to use the script standalone.
>     
>     And then write a first test and implementation of that basename base data
>     files improvement.
>     
>     If that works ok, then a follow-up change will modify the
>     log-gearman-client to import logclassify instead of running the script
>     directly.
>     
>     > Thank you for getting this started,
>     > Clark
>     > 
>     
>     Thanks for the quick feedback!
>     -Tristan
>     _______________________________________________
>     OpenStack-Infra mailing list
>     OpenStack-Infra at lists.openstack.org
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 
> 
> 
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20171124/0080ec7c/attachment.sig>


More information about the OpenStack-Infra mailing list