[openstack-dev] [Tuskar] All needed Tuskar metrics and alerts mapped to what Ceilometer supports

Ladislav Smola lsmola at redhat.com
Tue Sep 17 12:24:09 UTC 2013


Confirmation about the metrics of Hardware agent (Baremetal agent)
=========================================

It is collecting:
- cpu, memoryspace, diskspace, network traffic (the same agent will be 
running on all services, collecting the same data)

It should be running on:
- the physical servers on which Glance, Cinder, Quantum, Swift, Nova 
compute node and Nova controller runs
- the network devices used in the OpenStack environment (switches, 
firewalls ...)

Supported metrics
------------------------

* CPU utilisation for each CPU (percentage) (as cpu.util.1min, 
cpu.util.5min, cpu.util.15min )
* RAM utilisation (GB) (as memory.size.total, memory.size.used )
* Disk utilisation (GB) (as disk.size.total, disk.size.used)
* Incoming traffic for each NIC (Mbps) (as network.incoming.bytes)
* Outgoing traffic for each NIC (Mbps) (as network.outgoing.bytes)
- also track network.outgoing.errors, network.bandwidth.bytes
* Swap utilisation (GB)
- this should be part of Disk utilisation, we will just have to 
recognize the swap disk
* Number of currently running instances and the associated 
flavours(Ceilometer-Nova
   using instance:<type> and group_by resource_id) - This info will be 
queried from Overcloud Ceilometer

Missing metrics
--------------------
* System load -- see /proc/loadavg (percentage)

as described here 
https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices




On 09/16/2013 04:10 PM, Ladislav Smola wrote:
> Hello,
>
> this is follow up of T.Sedovic old email, trying to identify all 
> metrics, we will need to track for Tuskar.
> The Ceilometer API for Horizon is now in progress, so we have time to 
> finish the list of metrics
> and alarms we need. That may also raise the requests for some 
> Ceilometer API optimization
>
> This is meant for the open conversation, that will lead to the final list.
>
>
> Measurements
> =========
>
> The old list sent by tsedovic:
> -------------------------------------
>
> * CPU utilisation for each CPU (percentage) (Ceilometer-Nova as cpu_util)
> * RAM utilisation (GB) (Ceilometer-Nova as memory)
> - I do just assume, this is the used value and total value can be got 
> from the service itself,
>   needs confirmation
> * Swap utilisation (GB) (Ceilometer-Nova as disk.ephemeral.size)
> - I do just assume, this is the used value and total value can be got 
> from the service itself,
>   needs confirmation
> * Disk utilisation (GB) (Ceilometer-Cinder as volume.size and 
> Ceilometer-Swift as storage.objects.size)
> - I do just assume, this is the used value and total value can be got 
> from the service itself,
>   needs confirmation
> * System load -- see /proc/loadavg (percentage) (--)
> * Incoming traffic for each NIC (Mbps) ( Ceilometer-Nova as 
> network.incoming.bytes)
> * Outgoing traffic for each NIC (Mbps) (Ceilometer-Nova as 
> network.outgoing.bytes)
> - It is connected to VM interface now, I do expect Baremetal 
> agent(Hardware agent) will use NICs,
>   needs confirmation
> * Number of currently running instances and the associated 
> flavours(Ceilometer-Nova
>   using instance:<type> and group_by resource_id)
>
>
> The additional meters used in wireframes
> -----------------------------------------------------
>
> jcoufal could you add the additional measurements from the last 
> wireframes?
>
>
> The measurements the Ceilometer supports now
> ---------------------------------------------------------------
>
> http://docs.openstack.org/developer/ceilometer/measurements.html
>
> Feel free to include the others into wireframes jcoufal (I guess there 
> will have to be different
> overview pages for different Resource Classes, based on their service 
> type)
>
> I am in the process of finding out, whether all off this measurements 
> will be also collected by the
> Baremetal agent(Hardware agent). But I would say yes, from the 
> description it has (except the VM
> specific metrics like vcpusI guess)
>
> The missing meters
> -------------------------
>
> We will have to probably implement these (meaning implementing a 
> pollsters for the Baremetal
> agent(Hardware agent), that will collect these metrics)
>
> * System load -- see /proc/loadavg (percentage) (probably for all 
> services?)
>
> - Please add other Baremetal metrics you think we will need.
>
>
> Alerts
> ====
>
> Setting and Alarm
> -----------------------
>
> Simplified explanation of setting the alarm:
> In order to have alerts, you have to set an alarm first. Alarm can 
> contain any statistics query,
> a threshold and an operator. (e.g. fire alarm when avg cpu_util > 90% 
> on all instances of project_1).
> We can combine more alarms into one complex alarm. And you can browse 
> alarms.
> (There can be actions set up on alarm, but more about that later.)
>
> Showing alerts
> -------------------
>
> 1. I would be bold enough to distinguish system-meter (e.g. similar to 
> cpu_util > 90%, are used
> for Heat autoscaling). And user-defined-meter (the ones defined in 
> UI). Will we show both in
> the UI? Probably in different sections. System meters will require 
> extra caution.
>
> 2. For the table view of alarms, I would see it as a general 
> filterable order-able table of alarms.
> So we can easily show something like e.g. all nova alarms, all alarms 
> for cpu_util with condition > 90%
>
> 3. Now there is a ongoing conversation with eglynn, how to show the 
> 'aggregate alarms stats'
> and 'alarm time series':
> https://wiki.openstack.org/wiki/Ceilometer/blueprints/alarm-audit-api-group-by#Discussion 
>
> Next to the overview page with predefined charts, we should have a 
> general filterable order-able
> charts (the similar interface as table view above).
>
> Here is pictured a one possible way of how the charts for Alarms could 
> look like on the overview page:
> ( 
> http://file.brq.redhat.com/~jcoufal/openstack-m/user_stories/racks_detail-overview.pdf 
> <http://file.brq.redhat.com/%7Ejcoufal/openstack-m/user_stories/racks_detail-overview.pdf>) 
> .
> Any feedback is welcome. Also we should figure out what Alarms will be 
> used for defining e.g. there is
> something bad happening (like health chart?). Or what alarms to set 
> and show as default (lot of them
> is already being set by e.g. Heat)
>
> 4. There is a load of alerts used in wireframes, that are not 
> currently supported in Ceilometer (alerts can
> be only based on existing measurements), like instances failures, disk 
> failures, etc... We should write those
> down and probably write agents and pollsters for them. It make sense 
> to integrate them to Ceilometer,
> whatever they will be.
>
>
> Dynamic Ceilometer
> ============
>
> Due to the dynamic architecture of the ceilometer, any user can 
> actually add his own agent or pollster and
> that will give him new metrics. We should count with that, when 
> showing charts of alarms or stats, it should
> not be hardcoded.
>
> E.g. user will define his own alarm (maybe of his own metrics) and 
> want to build a health chart from this alarm
> on his Overview page. So there should be only default overview pages, 
> that can be modified and reset back
> to default. That way user himself can define e.g. bad behaviour, he 
> wants to show.
>
> Though this seems more like a future's future, we should think about 
> it at least a bit.
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130917/b7be0560/attachment.html>


More information about the OpenStack-dev mailing list