<br><br><div class="gmail_quote">On Thu, Nov 22, 2012 at 9:54 AM, Eoghan Glynn <span dir="ltr"><<a href="mailto:eglynn@redhat.com" target="_blank">eglynn@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Thanks for re-opening this discussion!<br>

<div class="im"><br>

> I'm not sure if we can provide a generic format, that will be<br>

> produced by all pollster and understood by all<br>

> transformer/publisher, and without information lost? Can it be<br>

> achieved through extending counter?<br>

<br>

</div>So maybe I'm over simplifying things, but would a simple common<br>

contextual argument list followed by a free-form kwargs be too<br>

limiting?<br>

<br>

e.g. something like:<br>

<br>

  class Transformer(object):<br>

<br>

    # Transform a raw data sample to the data type expected by the<br>

    # corresponding publisher.<br>

    #<br>

    # :param user: UUID of user owning resource usage<br>

    # :param tenant: UUID of associated tenant<br>

    # :param resource_id: UUID of metered resource<br>

    # :param resource_obj: some representation of the resource if available<br>

    # :param timestamp: time of measurement to microsecond resolution<br>

    # :param sample: kwargs containing raw data fields<br>

    def transform_sample(user, tenant, resource_id, timestamp, **sample):<br>

        raise NotImplemented()<br>

<br>

The pollster or notification handler would basically just stuff all<br>

the available raw data into the sample kwargs. The transformers would<br>

then have to know which named args to expect and how to interpret the<br>

resource_obj.<br></blockquote><div><br></div><div>In previous API designs I have always found it easier to have well-defined classes passing data between the layers, rather than accepting variable, undefined, arguments like this. The new developer coming along to add a feature has far less work to do when figuring out what data to emit as output or take as input to a new plugin, and it almost always turns out to be easier to create components that can be recombined in unexpected ways because of the standard data structures. Using a class also means fewer changes when new fields are added (because you only have to find where they are constructed, not every call to a plugin) or removed (because you can provide a backwards-compatibility @property method).</div>

<div><br></div><div>All of the notification and pollsters should continue to emit a common object (either Counter instances or some updated thing that meets our needs better). All of the transformers should accept instances of those objects as input. Each publisher should define a class representing what it wants as inputs -- no dictionaries, tuples, etc. Use classes that can be documented clearly (even if just as namedtuples). Since each transformer will be bound to a publisher, it will know which type of object(s) to emit (I assume a transformation may cause one Counter instance to become several inputs to a publisher).</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><br>

> If this is yes, then transformer will handle only data operation,<br>

> like calculating CPU utilization, dropping some data for different<br>

> frequency etc, while publisher will translate this well-known data<br>

> format to its own format.<br>

<br>

</div>I was hoping it could do more than calculating derived metrics, or<br>

stepping down the sampling rate.<br>

<br>

>From gerrit:<br>

<br>

  The point that I had envisaged for transformers was to factor<br>

  out the detailed per-measurement knowledge from the corresponding<br>

  publisher.<br>

<br>

  Take for example the stats related to disk I/O reported by the<br>

  hypervisor driver. Before these data can be pushed up to CloudWatch,<br>

  something has to know that the metric names are 'DiskWriteBytes',<br>

  'DiskReadBytes' etc., the namespace is 'AWS/EC2', the dimensions<br>

  include {'InstanceId': ID}, and the unit is Bytes.<br>

<br>

  So the idea was to avoid encoding all that knowledge in the CW<br>

  publisher, instead leaving the publisher simple and slow-changing<br>

  and unaffected by new metrics being added to the mix.<br>

<br>

Does that make sense at all?<br></blockquote><div><br></div><div>Yes, although I'm worried that enforcing the decoupling makes it more difficult for end users to set up a working system because they have to know about all 3 objects. Is there some way to make the transformers discoverable, so that a user only has to say "send the diskio counters to the CW publisher" and the dispatcher would then automatically load the appropriate transformers to use?</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><br>

> If not, would it be possibly think transformer as two types, one is<br>

> for format translation, one is for operation handling.  Format<br>

> translation will be pollster/publisher specific. Operation<br>

> transformer will be independent of pollster/publisher, although that<br>

> all data from pollster/transformer should include information<br>

> required for calculation, like name, type (guage, accumulative etc),<br>

> volume (anymore?).<br>

<br>

</div>Yeah, that's an idea. Wouldn't transformers have to be chained in that<br>

case? So for example the relevant CW transformer chain would be:<br>

<br>

  generic-transformer-calculating-cpu-util-from-cumulative-time --><br>

    CW-specific-transformer-outputting-as-CPUUtilization-datapoint<br>

<br>

That would be fine, just wanted to call out a potential extra<br>

bit of complexity to capture in the pipeline config.<br></blockquote><div><br></div><div>Would we expect users to configure that pipeline?</div><div><br></div><div>Doug</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

Cheers,<br>

Eoghan<br>

<div class="HOEnZb"><div class="h5"><br>

> Thanks<br>

> --jyh<br>

><br>

> Below is my understanding of the problem, hope it's correct and<br>

> helpful. .<br>

><br>

> I'd start from what's the data about, and then discuss the<br>

> transformation needed when the data flow from sources to publishers,<br>

> i.e. different requirement from different publishers:<br>

><br>

> 1) Data attribute. Data attribute is the real information that is<br>

> valuable.<br>

>       a) What's the data content. For example, currently DiskIOPollster<br>

>       will merge data from all disks, thus data for individual disks is<br>

>       invisible outside of the DiskIOPollster. (Will any publisher<br>

>       requires per disk information?)<br>

>       b) The time that the data is collected (this item includes the<br>

>       frequency also)<br>

>       c) The related information, like instance information, vnic<br>

>       information, tenant id, user id etc.<br>

><br>

> 2) Data format. Data format carry all data attribute. Different data<br>

> source will publish different data format, like notification<br>

> dataformat, libvirt's XML output format etc. Different publishers<br>

> may have different data format (Is this assertion correct?).<br>

><br>

> 3) Data operation. Calculating CPU utilization from CPU usage is in<br>

> fact operate the datapoint in the same metrics, sum all disk data<br>

> into DiskIOPollster is to relate different data source, different<br>

> frequency is in fact drop some data. Data operation is for<br>

> publishers requirement.<br>

><br>

> When data flow from source to publisher, all the above items may need<br>

> be transformed. Now is mostly done in Pollsters , like change all<br>

> format into Counter, sum all disks output, calculate CPU utilization<br>

> etc, because there is only publishers,<br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br>