[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

Eoghan Glynn eglynn at redhat.com
Fri Nov 16 16:31:55 UTC 2012



----- Original Message -----
> 
> ________________________________________
> From: Eoghan Glynn [eglynn at redhat.com]
> Sent: Friday, November 16, 2012 7:44 AM
> 
> >OK, I think we need to distinguish an (at least blurry) line between
> >instrumentation and monitoring.
> >
> >For me monitoring is mostly about coarse-grained observables that
> >allow user-oriented questions to be asked about cloud resources:
> >
> > - are my instances running hot?
> > - are my volumes falling behind with queued I/O?
> > - is my load balancer spitting out many 503s?
> >... etc.
> >
> >Whereas instrumentation to me implies much more internal-facing
> >and fine-grained concerns such as:
> >
> > - what's the fault-rate & latency for a particular API?
> > - how much time is being spent accessing the DB?
> > - how many idle connections are currently in some pool?
> >
> >I'm maybe stating the obvious above, but the point is that its
> >the type of question being asked that distinguishes monitoring
> >from instrumentation, not the sampling rate.
> >
> >For certain types of monitoring, I think we do need relatively
> >high sampling rates (e.g. once or twice a minute) that are near
> >constant (61s, 59s, 62s, ... as opposed to 45s, 75s, 52s, ...).
> >In that case, I'm not sure we can rely on the cadence of the
> >notifications issued by a busy nova compute service.
> 
> Yes, the line between instrumentation and monitoring can get a little
> blurry here and I think there are some important points you bring up
> here.
> 
> I would *never* assume anything in user space. I think monitoring of
> the users instances is out of scope for all things OpenStack. The
> users might deploy whatever tools they like for checking i/o, disk,
> network, etc.

That's an interesting point of view, that I hadn't considered before.

I would see AWS CloudWatch as a user-oriented monitoring service,
and would have assumed that there is scope for something similar
to be part of openstack.

When you say its out of scope, do you mean that it shouldn't be
something addressed by the IaaS fabric, and should instead be
something that users bolt on top by running agents *within* their
instances?

(as opposed to some piece of the openstack infrastructure doing
 this monitoring from *outside* the instance)
 
I recall the Heat team ran into some complications on this whole
within-versus-without question, as the openstack RBAC mechanisms
aren't yet flexible enough to easily grant in-instance agents
credentials giving limited roles such as the ability to report
metrics to a CW-like metricstore.


> For the Deployer of openstack, yes they will want to check if the
> Services are running hot, falling behind queued i/o or the public
> openstack api load balancer is spitting out 503's. My
> differentiation of instrumentation and monitoring in these cases are
> related to the sample rate and size of the payload. 503's, i/o, etc
> I would view as instrumentation. Sampled frequently and with little
> payload. This is the classic statsd/graphite data. Shown in a big
> graph on the wall of the Network Operations Center. Monitoring would
> be larger, slower, chunkier data for Capacity Planning, etc.
> Lifecycle state falls into the Monitoring camp as well. "Are things
> progressing as expected? Or will they be a problem down the road."

OK, so this cloud-operator-oriented view is also crucial, but I'd
be leery about making this the entire focus of our efforts (i.e. by
assuming that users will sort themselves out with their own monitoring
solution).

Cheers,
Eoghan



More information about the OpenStack-dev mailing list