Open Stack

Fri Nov 16 15:38:15 UTC 2012

________________________________________
From: Eoghan Glynn [eglynn at redhat.com]
Sent: Friday, November 16, 2012 7:44 AM

>OK, I think we need to distinguish an (at least blurry) line between
>instrumentation and monitoring.
>
>For me monitoring is mostly about coarse-grained observables that
>allow user-oriented questions to be asked about cloud resources:
>
> - are my instances running hot?
> - are my volumes falling behind with queued I/O?
> - is my load balancer spitting out many 503s?
>... etc.
>
>Whereas instrumentation to me implies much more internal-facing
>and fine-grained concerns such as:
>
> - what's the fault-rate & latency for a particular API?
> - how much time is being spent accessing the DB?
> - how many idle connections are currently in some pool?
>
>I'm maybe stating the obvious above, but the point is that its
>the type of question being asked that distinguishes monitoring
>from instrumentation, not the sampling rate.
>
>For certain types of monitoring, I think we do need relatively
>high sampling rates (e.g. once or twice a minute) that are near
>constant (61s, 59s, 62s, ... as opposed to 45s, 75s, 52s, ...).
>In that case, I'm not sure we can rely on the cadence of the
>notifications issued by a busy nova compute service.

Yes, the line between instrumentation and monitoring can get a little blurry here and I think there are some important points you bring up here. 

I would *never* assume anything in user space. I think monitoring of the users instances is out of scope for all things OpenStack. The users might deploy whatever tools they like for checking i/o, disk, network, etc.

For the Deployer of openstack, yes they will want to check if the Services are running hot, falling behind queued i/o or the public openstack api load balancer is spitting out 503's. My differentiation of instrumentation and monitoring in these cases are related to the sample rate and size of the payload. 503's, i/o, etc I would view as instrumentation. Sampled frequently and with little payload. This is the classic statsd/graphite data. Shown in a big graph on the wall of the Network Operations Center. Monitoring would be larger, slower, chunkier data for Capacity Planning, etc. Lifecycle state falls into the Monitoring camp as well. "Are things progressing as expected? Or will they be a problem down the road."

...

>> So, we need to revisit the notification format wrt versioning,
>> structure, payload size, content and overhead. Getting the data out
>> and doing something with it is easily do-able via a worker/consumer
>> or a proprietary notifier (and with no impact on nova core).
>
>OK, there may be a terminology gap here, can you explain what you
>mean by a "proprietary notifier" ... a non-standard notification_driver
>that can be plugged into nova?

Sorry, I just mean a notifier that is not part of trunk. It lives in a different namespace and is deployed separately, but follows the same api as other notifiers. 

Cheers
-S

Open Stack

[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

OpenStack

Community

Documentation

Branding & Legal