[openstack-dev] [Ceilometer] Aggregation discussion

Jay Pipes jaypipes at gmail.com
Sun Jan 12 22:13:59 UTC 2014


On Fri, 2014-01-10 at 17:10 +0400, Nadya Privalova wrote:
> Idea:
> The goal is to improve performance when user gets statistics for
> meter. Now we have fixed list of statistics (min, max and so on).
> During request a user may specify the following params:
> 1. query
> 2. group_by
> 3. period
> 
> The idea of bp is to pre-calculate some kind of requests and store
> them to a separate table in database.

Pre-calculate when? :) During processing of samples, or during some
periodic job?

> The pre-calculated statistics is called aggregates.

The term "aggregate" really just means a generic grouping or
summarization. If you are looking for a term that represents the
rules/heuristics for maintaining rolling calculations, perhaps the term
"report" is better?

> Aggregates may be merged among each others and with any Statistics'
> objects. 
> Note, that aggregates will be transparent for users. No changes in api
> is required during get_statistics.
> 
> Example:
> Let's assume we have 6 Samples about 'image' meter. All of them belong
> to one day (e.g. 1st May) but have happened in different times:
> 11.50, 12.25, 12.50, 13.25, 13.50 and 14.25.  User would like to get
> statistics about this meter from start = 11.30 till end = 14.30. So we
> need to process all samples. 
> But we may process these samples earlier and already have
> pre-calculated results for full hour 12.00 and 13.00. In this case we
> may get  Sample 11.50 and 14.25 from "meters" table and merge
> statistics for them with already calculated Statistic result from
> "aggregates" table.
> This example "saved" only 2 reads from DB. But if we consider metrics
> from pollsters with interval = 5 sec (720 Samples per hour) we will
> save 719 reads with aggregate usage.

Hmm. So, aggregation and grouping are the domain of data warehousing and
OLAP Servers. I don't believe that putting this functionality directly
in to Ceilometer is a good idea. I believe it would be better to
delegate this kind of functionality to well-known and used tools like
Pentaho [1], which can use a variety of different backend storage
systems.

Bottom line, I believe Ceilometer should focus strictly on the
collection of samples/alarms, the pre-processing of those things, and
the storage of those things, however I do not believe that Ceilometer
should become an OLAP analytics tool when existing ones already fill
that need.

Best,
-jay

[1] http://www.pentaho.com/5.0 &
http://en.wikipedia.org/wiki/Mondrian_OLAP_server





More information about the OpenStack-dev mailing list