[openstack-dev] [ceilometer] Multiple publisher and transformer

Doug Hellmann doug.hellmann at dreamhost.com
Mon Nov 26 15:22:28 UTC 2012


On Thu, Nov 22, 2012 at 9:54 AM, Eoghan Glynn <eglynn at redhat.com> wrote:

>
> Thanks for re-opening this discussion!
>
> > I'm not sure if we can provide a generic format, that will be
> > produced by all pollster and understood by all
> > transformer/publisher, and without information lost? Can it be
> > achieved through extending counter?
>
> So maybe I'm over simplifying things, but would a simple common
> contextual argument list followed by a free-form kwargs be too
> limiting?
>
> e.g. something like:
>
>   class Transformer(object):
>
>     # Transform a raw data sample to the data type expected by the
>     # corresponding publisher.
>     #
>     # :param user: UUID of user owning resource usage
>     # :param tenant: UUID of associated tenant
>     # :param resource_id: UUID of metered resource
>     # :param resource_obj: some representation of the resource if available
>     # :param timestamp: time of measurement to microsecond resolution
>     # :param sample: kwargs containing raw data fields
>     def transform_sample(user, tenant, resource_id, timestamp, **sample):
>         raise NotImplemented()
>
> The pollster or notification handler would basically just stuff all
> the available raw data into the sample kwargs. The transformers would
> then have to know which named args to expect and how to interpret the
> resource_obj.
>

In previous API designs I have always found it easier to have well-defined
classes passing data between the layers, rather than accepting variable,
undefined, arguments like this. The new developer coming along to add a
feature has far less work to do when figuring out what data to emit as
output or take as input to a new plugin, and it almost always turns out to
be easier to create components that can be recombined in unexpected ways
because of the standard data structures. Using a class also means fewer
changes when new fields are added (because you only have to find where they
are constructed, not every call to a plugin) or removed (because you can
provide a backwards-compatibility @property method).

All of the notification and pollsters should continue to emit a common
object (either Counter instances or some updated thing that meets our needs
better). All of the transformers should accept instances of those objects
as input. Each publisher should define a class representing what it wants
as inputs -- no dictionaries, tuples, etc. Use classes that can be
documented clearly (even if just as namedtuples). Since each transformer
will be bound to a publisher, it will know which type of object(s) to emit
(I assume a transformation may cause one Counter instance to become several
inputs to a publisher).


>
> > If this is yes, then transformer will handle only data operation,
> > like calculating CPU utilization, dropping some data for different
> > frequency etc, while publisher will translate this well-known data
> > format to its own format.
>
> I was hoping it could do more than calculating derived metrics, or
> stepping down the sampling rate.
>
> From gerrit:
>
>   The point that I had envisaged for transformers was to factor
>   out the detailed per-measurement knowledge from the corresponding
>   publisher.
>
>   Take for example the stats related to disk I/O reported by the
>   hypervisor driver. Before these data can be pushed up to CloudWatch,
>   something has to know that the metric names are 'DiskWriteBytes',
>   'DiskReadBytes' etc., the namespace is 'AWS/EC2', the dimensions
>   include {'InstanceId': ID}, and the unit is Bytes.
>
>   So the idea was to avoid encoding all that knowledge in the CW
>   publisher, instead leaving the publisher simple and slow-changing
>   and unaffected by new metrics being added to the mix.
>
> Does that make sense at all?
>

Yes, although I'm worried that enforcing the decoupling makes it more
difficult for end users to set up a working system because they have to
know about all 3 objects. Is there some way to make the transformers
discoverable, so that a user only has to say "send the diskio counters to
the CW publisher" and the dispatcher would then automatically load the
appropriate transformers to use?


>
> > If not, would it be possibly think transformer as two types, one is
> > for format translation, one is for operation handling.  Format
> > translation will be pollster/publisher specific. Operation
> > transformer will be independent of pollster/publisher, although that
> > all data from pollster/transformer should include information
> > required for calculation, like name, type (guage, accumulative etc),
> > volume (anymore?).
>
> Yeah, that's an idea. Wouldn't transformers have to be chained in that
> case? So for example the relevant CW transformer chain would be:
>
>   generic-transformer-calculating-cpu-util-from-cumulative-time -->
>     CW-specific-transformer-outputting-as-CPUUtilization-datapoint
>
> That would be fine, just wanted to call out a potential extra
> bit of complexity to capture in the pipeline config.
>

Would we expect users to configure that pipeline?

Doug


>
> Cheers,
> Eoghan
>
> > Thanks
> > --jyh
> >
> > Below is my understanding of the problem, hope it's correct and
> > helpful. .
> >
> > I'd start from what's the data about, and then discuss the
> > transformation needed when the data flow from sources to publishers,
> > i.e. different requirement from different publishers:
> >
> > 1) Data attribute. Data attribute is the real information that is
> > valuable.
> >       a) What's the data content. For example, currently DiskIOPollster
> >       will merge data from all disks, thus data for individual disks is
> >       invisible outside of the DiskIOPollster. (Will any publisher
> >       requires per disk information?)
> >       b) The time that the data is collected (this item includes the
> >       frequency also)
> >       c) The related information, like instance information, vnic
> >       information, tenant id, user id etc.
> >
> > 2) Data format. Data format carry all data attribute. Different data
> > source will publish different data format, like notification
> > dataformat, libvirt's XML output format etc. Different publishers
> > may have different data format (Is this assertion correct?).
> >
> > 3) Data operation. Calculating CPU utilization from CPU usage is in
> > fact operate the datapoint in the same metrics, sum all disk data
> > into DiskIOPollster is to relate different data source, different
> > frequency is in fact drop some data. Data operation is for
> > publishers requirement.
> >
> > When data flow from source to publisher, all the above items may need
> > be transformed. Now is mostly done in Pollsters , like change all
> > format into Counter, sum all disks output, calculate CPU utilization
> > etc, because there is only publishers,
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121126/f4d8d80a/attachment.html>


More information about the OpenStack-dev mailing list