[openstack-dev] [monasca] Ideas to work on

Hochmuth, Roland M roland.hochmuth at hpe.com
Tue Feb 14 01:04:45 UTC 2017

Hi Anqi, See my comments listed below. Regards --Roland

From: An Qi YL Lu <lash at cn.ibm.com<mailto:lash at cn.ibm.com>>
Date: Sunday, February 12, 2017 at 8:29 PM
To: Roland Hochmuth <roland.hochmuth at hpe.com<mailto:roland.hochmuth at hpe.com>>
Cc: OpenStack List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [monasca] Ideas to work on

Hi Roland

I am not sure whether you received my last email because I got a delivery failure notification. I am sending this again to ensure that you can see this email.


----- Original message -----
From: An Qi YL Lu/China/IBM
To: roland.hochmuth at hpe.com<mailto:roland.hochmuth at hpe.com>
Cc: openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>
Subject: Re: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 5:14 PM

Hi Roland

Thanks for your suggestions. The list you made is useful, helping me get clues in areas that I can work on. I spent some time doing investigation in the bps that you introduced.

I am most interested in data retention and metrics deleting.

Data retention: I had a quick look into the data retention policy of influxDB. It apparently support different retention policy for different series. To my understanding, the whiteboard in this bp has a straightforward design for this feature. I didn't quite get what is the complex point. Could you please shed some light so I can learn where the complicated part is?
The retention policy specified in the bp, https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention,  is per project. InfluxDB allows retention policies to be set per database, https://docs.influxdata.com/influxdb/v1.2/query_language/database_management/#create-retention-policies-with-create-retention-policy.

Currently, we store all metrics for all tenants in one database. One approach, which would involve a bit of re-engineering if we choose to do it, would be to store metrics for a project in a database for each project.

I could also imagine having retention policies per metric per tenant. For example, there might be metrics for metering that should be stored for a longer period than operational metrics. There isn't a way to do this directly in InfluxDB using the built-in data retention policy. However, it could possibly be done using delete and scheduling jobs that periodically run that prune the database.

For the Vertica database, we, as in HPE, simulate retention policies by running a cron job that drops partitions after some period of time, such as 45 days. Charter has a more sophisticated cron job that deletes metrics from specific tenants at different periods than the operational metrics. For example, tenants of the cloud might have their metrics deleted every two weeks. Metering metrics might be deleted every 13 months.

The problem with deleting specific metrics is the performance. Dropping partitions is extremely fast. However, deleting metrics might be slow and also lock the database and prevent writes and/or queries to it. Therefore, to delete metrics, you could trickle deletes in, reducing the overall impact for any period of time, or do in the Charter case, run the deletion script at 2:00 AM in the morning, when usage of the system is light.

Metrics deleting: In influxDB 1.1 (or any version after 0.9), it supports deleting series, though you cannot specify time interval for this operation. It simply deletes all points from a series in a database. I think one of the tricky parts is to decide the data dependent on a metric to be deleted, such as measurements, alarms. Please point it out if my understanding is not precise.
The problem I believe is that a single series in InfluxDB has the data for multiple tenants. Deleting a single series would then result in deleting series for all tenants. Similar to data retention policies, to support deletion of metrics, by metric name and optional dimensions, the storage of metrics would need to be handled differently and/or some other solution designed.

I would like to look at logs publishing as well. But unfortunately I did not find the monasca-log-api doc, which is supposed to be at https://github.com/openstack/monasca-log-api/tree/master/docs . I don't know how this log-api works now. Please share me a copy of the doc if you have one.
The new changes proposed by Steve Simpson are in the review that he just published at, https://review.openstack.org/#/c/433016/.

The current documentation is now under a slightly different directory than the link above at, https://github.com/openstack/monasca-log-api/blob/master/documentation/monasca-log-api-spec.md.


----- Original message -----
From: "Hochmuth, Roland M" <roland.hochmuth at hpe.com<mailto:roland.hochmuth at hpe.com>>
To: OpenStack List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, An Qi YL Lu/China/IBM at IBMCN
Subject: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 11:13 AM

Hi Anqi, You had expressed a strong interest in working on Monasca the other day in our Weekly Monasca Team Meeting. I owed you a response. The team had also asked me to also keep them in the loop. Here is a list that I feel is interesting, that is not trivial or extremely complex (just right hopefully), and doesn't overlap with some of the areas that other developers are working on, and consequently difficult to coordinate in a limited time.

  1.  RBAC: Currently, the Python API doesn't fully support Role Based Access Controls (RBAC) in the API. We've had discussions on this topic, but oddly, there isn't a blueprint written for this. But, this would be very useful to implement in the APIs similar to what other OpenStack projects support.
  2.  Data retention: https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention. We haven't completely reviewed and or approved this blueprint, but it would be very useful to add support for per-project, or per-metric data retention. This would involve understanding how data retention works in InfluxDB. We would also want to have some design discussion prior to proceeding, as it is probably more complex than described in the bp.
  3.  Publish logs and/or metrics to topics selectively. https://blueprints.launchpad.net/monasca/+spec/publish-logs-to-topic-selectively. In the context of metrics, this would be useful to identifying specific metrics as metering as opposed to monitoring metrics and allow them to be published to different Kafka topics as a result. The way this would be used is that the downstream Monasca Transform Engine would only get metrics sent to it that will be transformed and therefore doesn't need to filter them, which would help improve performance dramatically. For logging, it would help identity operational logs from audit logs. It could also be used to identity high priority metrics such that they could be published to a high-priority metrics topic in Kafka. There are several more contexts in which this is useful.
  4.  Delete metrics: https://blueprints.launchpad.net/monasca/+spec/delete-metrics. Basically adding the ability to delete metrics using the Monasca API. Typically, time series databases are not very good at deletes. We haven't tried to do this with InfluxDB, and while this might seem an easy task, it is a lot more involved than issuing the obvious and straight-forward DELETE command.

I hope this helps. Let me know if you want to discuss further or want more ideas.

Regards --Roland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170214/d4489b2d/attachment.html>

More information about the OpenStack-dev mailing list