[openstack-dev] [ironic] Policy for deprecating metric names
Mario Villaplana
mario.villaplana at gmail.com
Tue Jan 3 22:11:51 UTC 2017
Hi all,
Recently, Ruby found a patch that modifies the name of a metric
emitted by ironic. [0] After some IRC discussion, we realized that
there is no real deprecation policy regarding changing metric names.
[1]
For anyone not familiar with this feature, ironic has the capability
to emit various metrics to supported backends using metrics support in
ironic-lib. [2] Currently, the only supported backend is statsd. Most
(all?) metrics currently in ironic are implemented as function
decorators, like the following:
@METRICS.timer('my.module.MyClass.my_method')
def my_method(self):
...
This will send a time series datapoint to statsd which will be stored
with the current epoch timestamp, the name of the metric
('my.module.MyClass.my_method'), and the amount of time the method
took to finish.
The primary use case for this that I'm familiar with is generating
graphs with Graphite/Grafana to get a granular look at performance
over time. With Graphite/Grafana, operators can also create graphs
with wildcard matches. For example, a graph that matches on
ironic.conductor.*.* will contain metrics for all methods emitted by
modules in the ironic/conductor subdirectory. Each metric will appear
separately as a line on the same graph by default, if I remember
correctly.
I did some limited research into the way other OpenStack projects emit
metrics to statsd. I was only able to find one example in a short
amount of time - Swift. [3] Swift seems to document each metric
emitted with a short description of what the metric represents, but it
doesn't guarantee anything at all about the naming or semantics of
metrics.
I'd like to solicit the opinion of the community, especially operators
who use this feature, for what a good deprecation policy for metric
names should be.
As a former operator who used a downstream implementation very similar
to the upstream version in production, my recommendation is as
follows:
1. Document the metric name as well as what the metric represents in
the deploy docs, for each metric [2]
2. Guarantee to operators that the docs will be up to date, but don't
guarantee that the metric name won't change without warning between
deploys
3. Maybe document best practices for using metrics in a stable manner.
Things like using wildcards instead of keying off of specific metric
names, checking documentation for critical changes before deploys,
etc.
My reasoning for this is that it's hard to guarantee that a function
name won't change (or be completely removed) in between releases.
Since operators can use wildcards to match on metrics, it won't take
too long to notice any changes, even without staying up-to-date on the
documentation. One alternative that was suggested previously - keeping
both prior and new metric names for some deprecation period - won't
solve for the case where the function is removed. Additionally, that
would unnecessarily increase the amount of storage required for
metrics. In my experience, metric storage can be quite expensive.
There's a calculator for storage requirements for Whisper, one of the
storage backends used with Graphite, that can illustrate this. [4]
I haven't yet scoped the documentation work, but I'm curious about
feedback on this proposal or alternative suggestions from people who
use the feature.
Thank you!
Mario
[0] https://review.openstack.org/#/c/412339
[1] http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2016-12-19.log.html#t2016-12-19T16:16:11
[2] http://docs.openstack.org/developer/ironic/deploy/metrics.html
[3] http://docs.openstack.org/developer/swift/admin_guide.html#reporting-metrics-to-statsd
[4] http://m30m.github.io/whisper-calculator/
More information about the OpenStack-dev
mailing list