[openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports
Ladislav Smola
lsmola at redhat.com
Tue Sep 17 12:24:09 UTC 2013
Confirmation about the metrics of Hardware agent (Baremetal agent)
=========================================
It is collecting:
- cpu, memoryspace, diskspace, network traffic (the same agent will be
running on all services, collecting the same data)
It should be running on:
- the physical servers on which Glance, Cinder, Quantum, Swift, Nova
compute node and Nova controller runs
- the network devices used in the OpenStack environment (switches,
firewalls ...)
Supported metrics
------------------------
* CPU utilisation for each CPU (percentage) (as cpu.util.1min,
cpu.util.5min, cpu.util.15min )
* RAM utilisation (GB) (as memory.size.total, memory.size.used )
* Disk utilisation (GB) (as disk.size.total, disk.size.used)
* Incoming traffic for each NIC (Mbps) (as network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (as network.outgoing.bytes)
- also track network.outgoing.errors, network.bandwidth.bytes
* Swap utilisation (GB)
- this should be part of Disk utilisation, we will just have to
recognize the swap disk
* Number of currently running instances and the associated
flavours(Ceilometer-Nova
using instance:<type> and group_by resource_id) - This info will be
queried from Overcloud Ceilometer
Missing metrics
--------------------
* System load -- see /proc/loadavg (percentage)
as described here
https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
On 09/16/2013 04:10 PM, Ladislav Smola wrote:
> Hello,
>
> this is follow up of T.Sedovic old email, trying to identify all
> metrics, we will need to track for Tuskar.
> The Ceilometer API for Horizon is now in progress, so we have time to
> finish the list of metrics
> and alarms we need. That may also raise the requests for some
> Ceilometer API optimization
>
> This is meant for the open conversation, that will lead to the final list.
>
>
> Measurements
> =========
>
> The old list sent by tsedovic:
> -------------------------------------
>
> * CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
> * RAM utilisation (GB) (Ceilometer-Nova as memory)
> - I do just assume, this is the used value and total value can be got
> from the service itself,
> needs confirmation
> * Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
> - I do just assume, this is the used value and total value can be got
> from the service itself,
> needs confirmation
> * Disk utilisation (GB) (Ceilometer-Cinder as volume.size and
> Ceilometer-Swift as storage.objects.size)
> - I do just assume, this is the used value and total value can be got
> from the service itself,
> needs confirmation
> * System load -- see /proc/loadavg (percentage) (--)
> * Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova as
> network.incoming.bytes)
> * Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova as
> network.outgoing.bytes)
> - It is connected to VM interface now, I do expect Baremetal
> agent(Hardware agent) will use NICs,
> needs confirmation
> * Number of currently running instances and the associated
> flavours(Ceilometer-Nova
> using instance:<type> and group_by resource_id)
>
>
> The additional meters used in wireframes
> -----------------------------------------------------
>
> jcoufal could you add the additional measurements from the last
> wireframes?
>
>
> The measurements the Ceilometer supports now
> ---------------------------------------------------------------
>
> http://docs.openstack.org/developer/ceilometer/measurements.html
>
> Feel free to include the others into wireframes jcoufal (I guess there
> will have to be different
> overview pages for different Resource Classes, based on their service
> type)
>
> I am in the process of finding out, whether all off this measurements
> will be also collected by the
> Baremetal agent(Hardware agent). But I would say yes, from the
> description it has (except the VM
> specific metrics like vcpusI guess)
>
> The missing meters
> -------------------------
>
> We will have to probably implement these (meaning implementing a
> pollsters for the Baremetal
> agent(Hardware agent), that will collect these metrics)
>
> * System load -- see /proc/loadavg (percentage) (probably for all
> services?)
>
> - Please add other Baremetal metrics you think we will need.
>
>
> Alerts
> ====
>
> Setting and Alarm
> -----------------------
>
> Simplified explanation of setting the alarm:
> In order to have alerts, you have to set an alarm first. Alarm can
> contain any statistics query,
> a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90%
> on all instances of project_1).
> We can combine more alarms into one complex alarm. And you can browse
> alarms.
> (There can be actions set up on alarm, but more about that later.)
>
> Showing alerts
> -------------------
>
> 1. I would be bold enough to distinguish system-meter (e.g. similar to
> cpu_util > 90%, are used
> for Heat autoscaling). And user-defined-meter (the ones defined in
> UI). Will we show both in
> the UI? Probably in different sections. System meters will require
> extra caution.
>
> 2. For the table view of alarms, I would see it as a general
> filterable order-able table of alarms.
> So we can easily show something like e.g. all nova alarms, all alarms
> for cpu_util with condition > 90%
>
> 3. Now there is a ongoing conversation with eglynn, how to show the
> 'aggregate alarms stats'
> and 'alarm time series':
> https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#Discussion
>
> Next to the overview page with predefined charts, we should have a
> general filterable order-able
> charts (the similar interface as table view above).
>
> Here is pictured a one possible way of how the charts for Alarms could
> look like on the overview page:
> (
> http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf
> <http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>)
> .
> Any feedback is welcome. Also we should figure out what Alarms will be
> used for defining e.g. there is
> something bad happening (like health chart?). Or what alarms to set
> and show as default (lot of them
> is already being set by e.g. Heat)
>
> 4. There is a load of alerts used in wireframes, that are not
> currently supported in Ceilometer (alerts can
> be only based on existing measurements), like instances failures, disk
> failures, etc... We should write those
> down and probably write agents and pollsters for them. It make sense
> to integrate them to Ceilometer,
> whatever they will be.
>
>
> Dynamic Ceilometer
> ============
>
> Due to the dynamic architecture of the ceilometer, any user can
> actually add his own agent or pollster and
> that will give him new metrics. We should count with that, when
> showing charts of alarms or stats, it should
> not be hardcoded.
>
> E.g. user will define his own alarm (maybe of his own metrics) and
> want to build a health chart from this alarm
> on his Overview page. So there should be only default overview pages,
> that can be modified and reset back
> to default. That way user himself can define e.g. bad behaviour, he
> wants to show.
>
> Though this seems more like a future's future, we should think about
> it at least a bit.
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130917/b7be0560/attachment.html>
More information about the OpenStack-dev
mailing list