Open Stack

Mon Sep 16 14:10:19 UTC 2013

Hello,

this is follow up of T.Sedovic old email, trying to identify all 
metrics, we will need to track for Tuskar.
The Ceilometer API for Horizon is now in progress, so we have time to 
finish the list of metrics
and alarms we need. That may also raise the requests for some Ceilometer 
API optimization

This is meant for the open conversation, that will lead to the final list.

Measurements
=========

The old list sent by tsedovic:
-------------------------------------

* CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
* RAM utilisation (GB) (Ceilometer-Nova as memory)
- I do just assume, this is the used value and total value can be got 
from the service itself,
   needs confirmation
* Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
- I do just assume, this is the used value and total value can be got 
from the service itself,
   needs confirmation
* Disk utilisation (GB) (Ceilometer-Cinder as volume.size and 
Ceilometer-Swift as storage.objects.size)
- I do just assume, this is the used value and total value can be got 
from the service itself,
   needs confirmation
* System load -- see /proc/loadavg (percentage) (--)
* Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova as 
network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova as 
network.outgoing.bytes)
- It is connected to VM interface now, I do expect Baremetal 
agent(Hardware agent) will use NICs,
   needs confirmation
* Number of currently running instances and the associated 
flavours(Ceilometer-Nova
   using instance:<type> and group_by resource_id)

The additional meters used in wireframes
-----------------------------------------------------

jcoufal could you add the additional measurements from the last wireframes?

The measurements the Ceilometer supports now
---------------------------------------------------------------

http://docs.openstack.org/developer/ceilometer/measurements.html

Feel free to include the others into wireframes jcoufal (I guess there 
will have to be different
overview pages for different Resource Classes, based on their service type)

I am in the process of finding out, whether all off this measurements 
will be also collected by the
Baremetal agent(Hardware agent). But I would say yes, from the 
description it has (except the VM
specific metrics like vcpusI guess)

The missing meters
-------------------------

We will have to probably implement these (meaning implementing a 
pollsters for the Baremetal
agent(Hardware agent), that will collect these metrics)

* System load -- see /proc/loadavg (percentage) (probably for all services?)

- Please add other Baremetal metrics you think we will need.

Alerts
====

Setting and Alarm
-----------------------

Simplified explanation of setting the alarm:
In order to have alerts, you have to set an alarm first. Alarm can 
contain any statistics query,
a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90% on 
all instances of project_1).
We can combine more alarms into one complex alarm. And you can browse 
alarms.
(There can be actions set up on alarm, but more about that later.)

Showing alerts
-------------------

1. I would be bold enough to distinguish system-meter (e.g. similar to 
cpu_util > 90%, are used
for Heat autoscaling). And user-defined-meter (the ones defined in UI). 
Will we show both in
the UI? Probably in different sections. System meters will require extra 
caution.

2. For the table view of alarms, I would see it as a general filterable 
order-able table of alarms.
So we can easily show something like e.g. all nova alarms, all alarms 
for cpu_util with condition > 90%

3. Now there is a ongoing conversation with eglynn, how to show the 
'aggregate alarms stats'
and 'alarm time series':
https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#Discussion 

Next to the overview page with predefined charts, we should have a 
general filterable order-able
charts (the similar interface as table view above).

Here is pictured a one possible way of how the charts for Alarms could 
look like on the overview page:
( 
http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf 
<http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>) 
.
Any feedback is welcome. Also we should figure out what Alarms will be 
used for defining e.g. there is
something bad happening (like health chart?). Or what alarms to set and 
show as default (lot of them
is already being set by e.g. Heat)

4. There is a load of alerts used in wireframes, that are not currently 
supported in Ceilometer (alerts can
be only based on existing measurements), like instances failures, disk 
failures, etc... We should write those
down and probably write agents and pollsters for them. It make sense to 
integrate them to Ceilometer,
whatever they will be.

Dynamic Ceilometer
============

Due to the dynamic architecture of the ceilometer, any user can actually 
add his own agent or pollster and
that will give him new metrics. We should count with that, when showing 
charts of alarms or stats, it should
not be hardcoded.

E.g. user will define his own alarm (maybe of his own metrics) and want 
to build a health chart from this alarm
on his Overview page. So there should be only default overview pages, 
that can be modified and reset back
to default. That way user himself can define e.g. bad behaviour, he 
wants to show.

Though this seems more like a future's future, we should think about it 
at least a bit.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130916/b024c652/attachment.html>

Open Stack

[openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports

OpenStack

Community

Documentation

Branding & Legal