[aodh] [heat] Stein: How to create alarms based on rate metrics like CPU utilization?
Prior to Stein, Ceilometer issued a metric named /cpu_util/, which I could use to trigger alarms and autoscaling when CPU utilization was too high. cpu_util doesn't exist anymore. Instead, we are asked to use Gnocchi's /rate/ feature. However, when using rates, alarms on a group of resources require more parameters than just one metric: Both an aggregation and a reaggregation method are needed. For example, a group of instances that implement "myapp": gnocchi measures aggregation -m cpu --reaggregation mean --aggregation rate:mean --query server_group=myapp --resource-type instance Actually, this command uses a deprecated API (but from what I can see, Aodh still uses it). The new way is like this: gnocchi aggregates --resource-type instance '(aggregate rate:mean (metric cpu mean))' server_group=myapp If rate:mean is in the archive policy, it also works the other way around: gnocchi aggregates --resource-type instance '(aggregate mean (metric cpu rate:mean))' server_group=myapp Without reaggregation, I get quite unexpected numbers, including negative CPU rates. If you want to understand why, see this discussion with one of the Gnocchi maintainers [1]. *My problem*: Aodh allows me to set an aggregation method, but not a reaggregation method. How can I create alarms based on rates? The problem extends to Heat and autoscaling. Thanks much, Bernd. [1] https://github.com/gnocchixyz/gnocchi/issues/1044
I don't know how to solve this problem in aodh, but it is possible to use Prometheus to aggregate CPU utilization and trigger scaling. I wrote up how to do this with Senlin and Prometheus here: https://medium.com/@dkt26111/auto-scaling-openstack-instances-with-senlin-and-prometheus-46100a9a14e1?source=friends_link&sk=5c0a2aa9e541e8c350963e7ec72bcbb5 You can probably do something similar with Heat and Prometheus. On Sun, Aug 4, 2019 at 12:52 AM Bernd Bausch <berndbausch@gmail.com> wrote:
Prior to Stein, Ceilometer issued a metric named cpu_util, which I could use to trigger alarms and autoscaling when CPU utilization was too high.
cpu_util doesn't exist anymore. Instead, we are asked to use Gnocchi's rate feature. However, when using rates, alarms on a group of resources require more parameters than just one metric: Both an aggregation and a reaggregation method are needed.
For example, a group of instances that implement "myapp":
gnocchi measures aggregation -m cpu --reaggregation mean --aggregation rate:mean --query server_group=myapp --resource-type instance
Actually, this command uses a deprecated API (but from what I can see, Aodh still uses it). The new way is like this:
gnocchi aggregates --resource-type instance '(aggregate rate:mean (metric cpu mean))' server_group=myapp
If rate:mean is in the archive policy, it also works the other way around:
gnocchi aggregates --resource-type instance '(aggregate mean (metric cpu rate:mean))' server_group=myapp
Without reaggregation, I get quite unexpected numbers, including negative CPU rates. If you want to understand why, see this discussion with one of the Gnocchi maintainers [1].
My problem: Aodh allows me to set an aggregation method, but not a reaggregation method. How can I create alarms based on rates? The problem extends to Heat and autoscaling.
Thanks much,
Bernd.
Hi all, You can also collect `cpu.utilization_perc` metric with Monasca and trigger Heat auto-scaling as we demonstrated in the hands-on workshop at the last Summit in Denver. Here the Heat template we've used [1]. You can find the workshop material here [2]. Cheers Witek [1] https://github.com/sjamgade/monasca-autoscaling/blob/master/final/autoscalin... [2] https://github.com/sjamgade/monasca-autoscaling On 8/14/19 6:34 PM, Duc Truong wrote:
I don't know how to solve this problem in aodh, but it is possible to use Prometheus to aggregate CPU utilization and trigger scaling. I wrote up how to do this with Senlin and Prometheus here: https://medium.com/@dkt26111/auto-scaling-openstack-instances-with-senlin-and-prometheus-46100a9a14e1?source=friends_link&sk=5c0a2aa9e541e8c350963e7ec72bcbb5
You can probably do something similar with Heat and Prometheus.
On Sun, Aug 4, 2019 at 12:52 AM Bernd Bausch <berndbausch@gmail.com> wrote:
Prior to Stein, Ceilometer issued a metric named cpu_util, which I could use to trigger alarms and autoscaling when CPU utilization was too high.
cpu_util doesn't exist anymore. Instead, we are asked to use Gnocchi's rate feature. However, when using rates, alarms on a group of resources require more parameters than just one metric: Both an aggregation and a reaggregation method are needed.
For example, a group of instances that implement "myapp":
gnocchi measures aggregation -m cpu --reaggregation mean --aggregation rate:mean --query server_group=myapp --resource-type instance
Actually, this command uses a deprecated API (but from what I can see, Aodh still uses it). The new way is like this:
gnocchi aggregates --resource-type instance '(aggregate rate:mean (metric cpu mean))' server_group=myapp
If rate:mean is in the archive policy, it also works the other way around:
gnocchi aggregates --resource-type instance '(aggregate mean (metric cpu rate:mean))' server_group=myapp
Without reaggregation, I get quite unexpected numbers, including negative CPU rates. If you want to understand why, see this discussion with one of the Gnocchi maintainers [1].
My problem: Aodh allows me to set an aggregation method, but not a reaggregation method. How can I create alarms based on rates? The problem extends to Heat and autoscaling.
Thanks much,
Bernd.
FYI, there's a new ML with same topic in [1] [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010210.h... On Sun, Aug 4, 2019 at 3:55 PM Bernd Bausch <berndbausch@gmail.com> wrote:
Prior to Stein, Ceilometer issued a metric named *cpu_util*, which I could use to trigger alarms and autoscaling when CPU utilization was too high.
cpu_util doesn't exist anymore. Instead, we are asked to use Gnocchi's *rate* feature. However, when using rates, alarms on a group of resources require more parameters than just one metric: Both an aggregation and a reaggregation method are needed.
For example, a group of instances that implement "myapp":
gnocchi measures aggregation -m cpu --reaggregation mean --aggregation rate:mean --query server_group=myapp --resource-type instance
Actually, this command uses a deprecated API (but from what I can see, Aodh still uses it). The new way is like this:
gnocchi aggregates --resource-type instance '(aggregate rate:mean (metric cpu mean))' server_group=myapp
If rate:mean is in the archive policy, it also works the other way around:
gnocchi aggregates --resource-type instance '(aggregate mean (metric cpu rate:mean))' server_group=myapp
Without reaggregation, I get quite unexpected numbers, including negative CPU rates. If you want to understand why, see this discussion with one of the Gnocchi maintainers [1].
*My problem*: Aodh allows me to set an aggregation method, but not a reaggregation method. How can I create alarms based on rates? The problem extends to Heat and autoscaling.
Thanks much,
Bernd.
-- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin
participants (4)
-
Bernd Bausch
-
Duc Truong
-
Rico Lin
-
Witek Bedyk