[openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports
Ladislav Smola
lsmola at redhat.com
Mon Sep 16 14:10:19 UTC 2013
Hello,
this is follow up of T.Sedovic old email, trying to identify all
metrics, we will need to track for Tuskar.
The Ceilometer API for Horizon is now in progress, so we have time to
finish the list of metrics
and alarms we need. That may also raise the requests for some Ceilometer
API optimization
This is meant for the open conversation, that will lead to the final list.
Measurements
=========
The old list sent by tsedovic:
-------------------------------------
* CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
* RAM utilisation (GB) (Ceilometer-Nova as memory)
- I do just assume, this is the used value and total value can be got
from the service itself,
needs confirmation
* Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
- I do just assume, this is the used value and total value can be got
from the service itself,
needs confirmation
* Disk utilisation (GB) (Ceilometer-Cinder as volume.size and
Ceilometer-Swift as storage.objects.size)
- I do just assume, this is the used value and total value can be got
from the service itself,
needs confirmation
* System load -- see /proc/loadavg (percentage) (--)
* Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova as
network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova as
network.outgoing.bytes)
- It is connected to VM interface now, I do expect Baremetal
agent(Hardware agent) will use NICs,
needs confirmation
* Number of currently running instances and the associated
flavours(Ceilometer-Nova
using instance:<type> and group_by resource_id)
The additional meters used in wireframes
-----------------------------------------------------
jcoufal could you add the additional measurements from the last wireframes?
The measurements the Ceilometer supports now
---------------------------------------------------------------
http://docs.openstack.org/developer/ceilometer/measurements.html
Feel free to include the others into wireframes jcoufal (I guess there
will have to be different
overview pages for different Resource Classes, based on their service type)
I am in the process of finding out, whether all off this measurements
will be also collected by the
Baremetal agent(Hardware agent). But I would say yes, from the
description it has (except the VM
specific metrics like vcpusI guess)
The missing meters
-------------------------
We will have to probably implement these (meaning implementing a
pollsters for the Baremetal
agent(Hardware agent), that will collect these metrics)
* System load -- see /proc/loadavg (percentage) (probably for all services?)
- Please add other Baremetal metrics you think we will need.
Alerts
====
Setting and Alarm
-----------------------
Simplified explanation of setting the alarm:
In order to have alerts, you have to set an alarm first. Alarm can
contain any statistics query,
a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90% on
all instances of project_1).
We can combine more alarms into one complex alarm. And you can browse
alarms.
(There can be actions set up on alarm, but more about that later.)
Showing alerts
-------------------
1. I would be bold enough to distinguish system-meter (e.g. similar to
cpu_util > 90%, are used
for Heat autoscaling). And user-defined-meter (the ones defined in UI).
Will we show both in
the UI? Probably in different sections. System meters will require extra
caution.
2. For the table view of alarms, I would see it as a general filterable
order-able table of alarms.
So we can easily show something like e.g. all nova alarms, all alarms
for cpu_util with condition > 90%
3. Now there is a ongoing conversation with eglynn, how to show the
'aggregate alarms stats'
and 'alarm time series':
https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#Discussion
Next to the overview page with predefined charts, we should have a
general filterable order-able
charts (the similar interface as table view above).
Here is pictured a one possible way of how the charts for Alarms could
look like on the overview page:
(
http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf
<http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>)
.
Any feedback is welcome. Also we should figure out what Alarms will be
used for defining e.g. there is
something bad happening (like health chart?). Or what alarms to set and
show as default (lot of them
is already being set by e.g. Heat)
4. There is a load of alerts used in wireframes, that are not currently
supported in Ceilometer (alerts can
be only based on existing measurements), like instances failures, disk
failures, etc... We should write those
down and probably write agents and pollsters for them. It make sense to
integrate them to Ceilometer,
whatever they will be.
Dynamic Ceilometer
============
Due to the dynamic architecture of the ceilometer, any user can actually
add his own agent or pollster and
that will give him new metrics. We should count with that, when showing
charts of alarms or stats, it should
not be hardcoded.
E.g. user will define his own alarm (maybe of his own metrics) and want
to build a health chart from this alarm
on his Overview page. So there should be only default overview pages,
that can be modified and reset back
to default. That way user himself can define e.g. bad behaviour, he
wants to show.
Though this seems more like a future's future, we should think about it
at least a bit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130916/b024c652/attachment.html>
More information about the OpenStack-dev
mailing list