Open Stack

Wed Dec 3 12:34:01 UTC 2014

> Hi folks,
> 
> 
> 
> I wonder if anyone could share some best practice regarding to the usage of
> ceilometer alarm. We are using the alarm evaluation/notification of
> ceilometer and we don’t feel very well of the way we use it. Below is our
> problem:
> 
> 
> 
> ============================
> 
> Scenario:
> 
> When cpu usage or memory usage above a certain threshold, alerts should be
> displayed on admin’s web page. There should be a 3 level alerts according to
> meter value, namely notice, warning, fatal. Notice means the meter value is
> between 50% ~ 70%, warning means between 70% ~ 85% and fatal means above 85%
> 
> For example:
> 
> * when one vm’s cpu usage is 72%, an alert message should be displayed saying
> “Warning: vm[d9b7018b-06c4-4fba-8221-37f67f6c6b8c] cpu usage is above 70%”.
> 
> * when one vm’s memory usage is 90%, another alert message should be created
> saying “Fatal: vm[d9b7018b-06c4-4fba-8221-37f67f6c6b8c] memory usage is
> above 85%”
> 
> 
> 
> Our current Solution:
> 
> We used ceilometer alarm evaluation/notification to implement this. To
> distinguish which VM and which meter is above what value, we’ve created one
> alarm for each VM by each condition. So, to monitor 1 VM, 6 alarms will be
> created because there are 2 meters and for each meter there are 3 levels.
> That means, if there are 100 VMs to be monitored, 600 alarms will be
> created.
> 
> 
> 
> Problems:
> 
> * The first problem is, when the number of meters increases, the number of
> alarms will be multiplied. For example, customer may want alerts on disk and
> network IO rates, and if we do that, there will be 4*3=12 alarms for each
> VM.
> 
> * The second problem is, when one VM is created, multiple alarms will be
> created, meaning multiple http requests will be fired. In the case above, 6
> HTTP requests will be needed once a VM is created. And this number also
> increases as the number of meters goes up.

One way of reducing both the number of alarms and the volume of notifications
would be to group related VMs, if such a concept exists in your use-case.

This is effectively how Heat autoscaling uses ceilometer, alarming on the
average of some statistic over a set of instances (as opposed to triggering
on individual instances).

The VMs could be grouped by setting user-metadata of form:

  nova boot ... --meta metering.my_server_group=foobar

Any user-metadata prefixed with 'metering.' will be preserved by ceilometer
in the resource_metadata.user_metedata stored for each sample, so that it
can used to select the statistics on which the alarm is based, e.g.

  ceilometer alarm-threshold-create --name cpu_high_foobar \
    --description 'warning: foobar instance group running hot' \
    --meter-name cpu_util --threshold 70.0 \
    --comparison-operator gt --statistic avg \
    ...
    --query metadata.user_metedata.my_server_group=foobar

This approach is of course predicated on the there being some natural
grouping relation between instances in your environment.

Cheers,
Eoghan

> =============================
> 
> 
> 
> Do anyone have any suggestions?
> 
> 
> 
> 
> 
> 
> 
> Best Regards!
> 
> Kurt Rao
> 
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

Open Stack

[Openstack] [Ceilometer] looking for alarm best practice - please help

OpenStack

Community

Documentation

Branding & Legal