[openstack-dev] [ceilometer] Multiple publisher and transformer

Jiang, Yunhong yunhong.jiang at intel.com
Fri Nov 23 02:25:15 UTC 2012



> -----Original Message-----
> From: Julien Danjou [mailto:julien at danjou.info]
> Sent: Friday, November 23, 2012 12:56 AM
> To: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [ceilometer] Multiple publisher and transformer
> 
> On Thu, Nov 22 2012, Eoghan Glynn wrote:
> 
> > The pollster or notification handler would basically just stuff all
> > the available raw data into the sample kwargs. The transformers would
> > then have to know which named args to expect and how to interpret the
> > resource_obj.
> 
> I don't like the idea of having the transformer to guess which args is
> going to get.

Same to me. That will make the information be transformer specific.

> 
> To me, the pollster or notifications handler should be responsible to
> emit the maximum amount of "counters" it can from what it gets. And for
> each counter, you would pass it through a transformer, mangling the
> value to some over thing.

Yes. And I think some changes to current Counter (or sampler as discussed in IRC). For example, in compute/libvirt.py/NetPollster, only vnic information is kept, all information about the corresponding instance, except instance_id, are removed. We need keep all information through the whole transformer, till some transformer/publisher is sure to remove them, mostly possibly the format transformer for the publisher.

> 
> > From gerrit:
> >
> >   The point that I had envisaged for transformers was to factor
> >   out the detailed per-measurement knowledge from the corresponding
> >   publisher.
> >
> >   Take for example the stats related to disk I/O reported by the
> >   hypervisor driver. Before these data can be pushed up to CloudWatch,
> >   something has to know that the metric names are 'DiskWriteBytes',
> >   'DiskReadBytes' etc., the namespace is 'AWS/EC2', the dimensions
> >   include {'InstanceId': ID}, and the unit is Bytes.
> >
> >   So the idea was to avoid encoding all that knowledge in the CW
> >   publisher, instead leaving the publisher simple and slow-changing
> >   and unaffected by new metrics being added to the mix.
> >
> > Does that make sense at all?
> 
> Yes, it totally does.

Yes.  And JD's idea is good. I think the CW will include several format transformer. First is the value changes, like from disk.io to DiskWriteBytes, second is dictionary key mapping like resource_id to dimensions, the third one delete all un-needed key-value pair.

All these three steps can be generic or publisher specific depends on the implementation. And if they generic, we need pass some publisher-specific configuration to the transformer.
To me, the above 3 can be generic transformer.

The only concern is, will performance be impact if the transformer chain is too long :)

> 
> The solution I'd imagine for that is to have the all pollster to emit
> something equivalent (but probably simpler) to Counter like for example

Instead of have pollster to emit simpler data, I still suggest to keep everything in Counter will pollster send out the first step data. Instead, I'd have a transformer to cut un-used filed.

> in this case:
> 
>   { resource_name = 'disk.io',
>     resource_id = 'instanceid.vda',
>     user_id = 'qwerty789',
>     tenant_id = 'abcdef123',
>     value = 123456 }
> 
> So for CW you wouldn't transform the value, but the resource_name to
> some other thing. This could be achieved via a transformer named
> "RenameResourceName" which could be "configured" with a map:
> 
>   { "disk.io": "DiskWriteBytes",
>      … }
> 
> So it's kind of generic and you can even use it to do some other stuff.
> 
> Does that make sense?

Yes.

> 
> > Yeah, that's an idea. Wouldn't transformers have to be chained in that
> > case? So for example the relevant CW transformer chain would be:
> >
> >   generic-transformer-calculating-cpu-util-from-cumulative-time -->
> >     CW-specific-transformer-outputting-as-CPUUtilization-datapoint
> >
> > That would be fine, just wanted to call out a potential extra
> > bit of complexity to capture in the pipeline config.
> 
> Yes, I think we'll need to chain them. Don't know how we can do that
> easily in our configuration file.

Followed is the configuration file in my mind:

[publisher.cw.pipe]

*=generic-value-changes:config_file, generic-key-mapping:config_file, generic-field-selection:filed_list

*.cpu = generic-transformer-calculating-cpu-util-from-cumulative-time:interval,*

The format is below, possibly should in BNF :$
Pollster_source.meter_data=transformer_name:transformer_parameters, transformer_name:transformer_parameters,.....

And "*" applies here.

Added the pollster_source because it's the only information that will not be included in the Counter (or Sampler), although not sure the useness.

Is it ok? Too complex?

> 
> OTOH, I imagine that most of the transformers configuration is going to
> be pretty generic, like for CW. So we could define and provide a default

I'm not sure if they will be generic enough, like the CW changes will be publisher specific, 

> JSON file with all the default pipelining set-up correctly for most
> counters and publishers.

What do you mean of "all the default pipelining"? I suppose the whole pipeline definition will be pre-defined. Of course, we will give detailed document for each transformer, so that advanced user can create pipeline configuration if needed.

Thanks
--jyh

> 
> --
> Julien Danjou
> // Free Software hacker & freelance
> // http://julien.danjou.info


More information about the OpenStack-dev mailing list