[openstack-dev] [ceilometer] Multiple publisher and transformer

Eoghan Glynn eglynn at redhat.com
Thu Nov 22 14:54:45 UTC 2012

Thanks for re-opening this discussion!

> I'm not sure if we can provide a generic format, that will be
> produced by all pollster and understood by all
> transformer/publisher, and without information lost? Can it be
> achieved through extending counter?

So maybe I'm over simplifying things, but would a simple common
contextual argument list followed by a free-form kwargs be too

e.g. something like:

  class Transformer(object):

    # Transform a raw data sample to the data type expected by the
    # corresponding publisher.
    # :param user: UUID of user owning resource usage
    # :param tenant: UUID of associated tenant
    # :param resource_id: UUID of metered resource
    # :param resource_obj: some representation of the resource if available
    # :param timestamp: time of measurement to microsecond resolution
    # :param sample: kwargs containing raw data fields
    def transform_sample(user, tenant, resource_id, timestamp, **sample):
        raise NotImplemented()

The pollster or notification handler would basically just stuff all
the available raw data into the sample kwargs. The transformers would
then have to know which named args to expect and how to interpret the

> If this is yes, then transformer will handle only data operation,
> like calculating CPU utilization, dropping some data for different
> frequency etc, while publisher will translate this well-known data
> format to its own format.

I was hoping it could do more than calculating derived metrics, or
stepping down the sampling rate.

>From gerrit:

  The point that I had envisaged for transformers was to factor
  out the detailed per-measurement knowledge from the corresponding

  Take for example the stats related to disk I/O reported by the
  hypervisor driver. Before these data can be pushed up to CloudWatch,
  something has to know that the metric names are 'DiskWriteBytes',
  'DiskReadBytes' etc., the namespace is 'AWS/EC2', the dimensions
  include {'InstanceId': ID}, and the unit is Bytes.

  So the idea was to avoid encoding all that knowledge in the CW
  publisher, instead leaving the publisher simple and slow-changing
  and unaffected by new metrics being added to the mix.

Does that make sense at all?

> If not, would it be possibly think transformer as two types, one is
> for format translation, one is for operation handling.  Format
> translation will be pollster/publisher specific. Operation
> transformer will be independent of pollster/publisher, although that
> all data from pollster/transformer should include information
> required for calculation, like name, type (guage, accumulative etc),
> volume (anymore?).

Yeah, that's an idea. Wouldn't transformers have to be chained in that
case? So for example the relevant CW transformer chain would be:

  generic-transformer-calculating-cpu-util-from-cumulative-time -->

That would be fine, just wanted to call out a potential extra
bit of complexity to capture in the pipeline config.


> Thanks
> --jyh
> Below is my understanding of the problem, hope it's correct and
> helpful. .
> I'd start from what's the data about, and then discuss the
> transformation needed when the data flow from sources to publishers,
> i.e. different requirement from different publishers:
> 1) Data attribute. Data attribute is the real information that is
> valuable.
> 	a) What's the data content. For example, currently DiskIOPollster
> 	will merge data from all disks, thus data for individual disks is
> 	invisible outside of the DiskIOPollster. (Will any publisher
> 	requires per disk information?)
> 	b) The time that the data is collected (this item includes the
> 	frequency also)
> 	c) The related information, like instance information, vnic
> 	information, tenant id, user id etc.
> 2) Data format. Data format carry all data attribute. Different data
> source will publish different data format, like notification
> dataformat, libvirt's XML output format etc. Different publishers
> may have different data format (Is this assertion correct?).
> 3) Data operation. Calculating CPU utilization from CPU usage is in
> fact operate the datapoint in the same metrics, sum all disk data
> into DiskIOPollster is to relate different data source, different
> frequency is in fact drop some data. Data operation is for
> publishers requirement.
> When data flow from source to publisher, all the above items may need
> be transformed. Now is mostly done in Pollsters , like change all
> format into Counter, sum all disks output, calculate CPU utilization
> etc, because there is only publishers,
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list