[openstack-dev] [ceilometer] Multiple publisher and transformer

Jiang, Yunhong yunhong.jiang at intel.com
Thu Nov 22 05:09:03 UTC 2012


This mail is to re-start the multiple publisher discussion as suggested in Nov 21 IRC meeting (http://eavesdrop.openstack.org/meetings/ceilometer/2012/ceilometer.2012-11-21-21.00.html ).

The related information can be found at:
bp:  	https://blueprints.launchpad.net/ceilometer/+spec/multi-publisher 
bug:		https://bugs.launchpad.net/ceilometer/+bug/1073988 
Previous ML discussion: 
	http://lists.openstack.org/pipermail/openstack-dev/2012-October/001840.html
Initial patch and include some discussion already:
	https://review.openstack.org/#/c/16522/ 

Basic idea is have three components to support multiple publisher requirement :pollster or notification handler/transformer/publisher, so that data can flow from different source to different publishers well.
	- Pollsters or notification handlers collect data from the source
	- publishers emit the transformed data over some conduit ((AMQP notification, CW PutMetricData call, statsd UDP packet, or whatever)
	- Transformers pass collected data from pollsters to publishers, with transformation in this process.

I think currently everyone one agree with this split, although with different idea of the role/responsibility among these three components, especially data format transferred between them etc.

I'm not sure if we can provide a generic format, that will be produced by all pollster and understood by all transformer/publisher, and without information lost? Can it be achieved through extending counter?

If this is yes, then transformer will handle only data operation, like calculating CPU utilization, dropping some data for different frequency etc, while publisher will translate this well-known data format to its own format.

If not, would it be possibly think transformer as two types, one is for format translation, one is for operation handling.  Format translation will be pollster/publisher specific. Operation transformer will be independent of pollster/publisher, although that all data from pollster/transformer should include information required for calculation, like name, type (guage, accumulative etc), volume (anymore?).

Thanks
--jyh

Below is my understanding of the problem, hope it's correct and helpful. .

I'd start from what's the data about, and then discuss the transformation needed when the data flow from sources to publishers, i.e. different requirement from different publishers:

1) Data attribute. Data attribute is the real information that is valuable.
	a) What's the data content. For example, currently DiskIOPollster will merge data from all disks, thus data for individual disks is invisible outside of the DiskIOPollster. (Will any publisher requires per disk information?)
	b) The time that the data is collected (this item includes the frequency also)
	c) The related information, like instance information, vnic information, tenant id, user id etc.

2) Data format. Data format carry all data attribute. Different data source will publish different data format, like notification dataformat, libvirt's XML output format etc. Different publishers may have different data format (Is this assertion correct?).

3) Data operation. Calculating CPU utilization from CPU usage is in fact operate the datapoint in the same metrics, sum all disk data into DiskIOPollster is to relate different data source, different frequency is in fact drop some data. Data operation is for publishers requirement.

When data flow from source to publisher, all the above items may need be transformed. Now is mostly done in Pollsters , like change all format into Counter, sum all disks output, calculate CPU utilization etc, because there is only publishers,



More information about the OpenStack-dev mailing list