AW: [telemetry][ceilometer][gnocchi] How to configure aggregate for cpu_util or calculate from metrics
Bernd Bausch
berndbausch at gmail.com
Thu Aug 1 12:20:49 UTC 2019
I have a solution. At least it works for me. Be aware that this is
Devstack, but I think nothing I did to solve my problem is
Devstack-specific. Also, I don't know whether there are more efficient
or canonical ways to reconfigure Ceilometer. But it's good enough for me.
These are my steps - you may not need all of them.
* in *pipeline.yaml*, set publisher to gnocchi://
* in *the resource definition file*, define my new archive policy.
By default, this file resides in the Ceilometer source tree
.../ceilometer/publisher/data/gnocchi_resources.yaml, but you can
use config parameter resources_definition_file to change the default
(I didn't try).
Example:
- name: ceilometer-medium-rate
aggregation_methods:
- mean
- rate:mean
back_window: 0
definition:
- granularity: 1 minute
timespan: 7 days
- granularity: 1 hour
timespan: 365 days
* in the same resource definition file, *adjust the archive policy *of
rate metrics.
Example:
- resource_type: instance
metrics:
...
cpu:
archive_policy_name: ceilometer-medium-rate
* *delete all existing metrics and resources *from Gnocchi
Probably only necessary when Ceilometer is running, and not needed
if you reconfigure it before its first start.
This is a drastic measure, but if you do it at the beginning of a
deployment, it won't cause loss of much data.
Why is this required? A metric contains an archive policy that can't
be changed. Thus existing metrics need to be recreated.
Why remove resources? Because they reference the metrics that I removed.
* *restart all Ceilometer services*
This is required for re-reading the pipeline and the resource
definition files.
Ceilometer will create resources and metrics as needed when it sends
its samples to Gnocchi.
I tested this by running a CPU hogging instance and listing its measures
after a few minutes:
gnocchi measures show --resource f28f6b78-9dd5-49cc-a6ac-28cb14477bf0
--aggregation rate:mean cpu
+---------------------------+-------------+---------------+
| timestamp | granularity | value |
+---------------------------+-------------+---------------+
| 2019-08-01T20:23:00+09:00 | 60.0 | 1810000000.0 |
| 2019-08-01T20:24:00+09:00 | 60.0 | 39940000000.0 |
| 2019-08-01T20:25:00+09:00 | 60.0 | 40110000000.0 |
This means that the instance accumulated 39940000000 nanoseconds of CPU
time in the 60 seconds at
20:24:00. Note that the old /cpu_util /was expressed in percent, so that
Aodh alarms and Heat autoscaling definitions must be adapted.
Good luck. Hire me as Ceilometer consultant if you get stuck :)
Bernd
On 8/1/2019 6:11 PM, Teckelmann, Ralf, NMU-OIP wrote:
>
> Hello Bernd, Hello Lingxian,
>
>
> +1
>
>
> You are not alone in your fruitless endeavor. Sadly, I can not come up
> with a solution.
>
> We are stuck at the same point.
>
>
> Maybe some day a dedicated member of the OpenStack community give the
> ceilometer guys a push to explain their service.
> For us, also using Stein, it is in the state of "not production ready".
>
> Cheers,
>
> Ralf T.
> ------------------------------------------------------------------------
> *Von:* Bernd Bausch <berndbausch at gmail.com>
> *Gesendet:* Donnerstag, 1. August 2019 03:16:25
> *An:* Lingxian Kong <anlin.kong at gmail.com>
> *Cc:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Betreff:* Re: [telemetry][ceilometer][gnocchi] How to configure
> aggregate for cpu_util or calculate from metrics
>
> Lingxian,
>
> Thanks for "bumping" my request and keeping it alive. The reason I
> need an answer: I am updating courseware to Stein that includes
> autoscaling based on CPU and disk I/O rates. Looks like I am "cutting
> edge" :)
>
> I don't think the problem is in the Gnocchi camp, but rather
> Ceilometer. To store rates of measures in z, the following is needed:
>
> * A /metric/. Raw measures are sent to the metric.
> * An /archive policy/. The metric has an archive policy.
> * The archive policy includes one or more /rate aggregates/
>
> My cloud has archive policies with rate aggregates, but the question
> is about the first bullet: *How can I configure Ceilometer so that it
> creates the corresponding metrics and sends measures to them. *In
> other words, how is Ceilometer's output connected to my archive
> policy. From my experience, just adding the archive policy to
> Ceilometer's publishers is not sufficient.
>
> Ceilometer's source code includes
> /.../publisher/data/gnocchi_resources.yaml/, which might well be the
> place where this can be configured. I am not sure how to do it though,
> and this file is not documented. I can read the source, but my
> developer skills are insufficient for understanding how everything
> fits together.
>
> Bernd
>
> On 8/1/2019 9:01 AM, Lingxian Kong wrote:
>> Hi Bernd,
>>
>> There were a lot of people asked the same question before,
>> unfortunately, I don't know the answer either(we are still using an
>> old version of Ceilometer). The original cpu_util support has been
>> removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no
>> doc in Gnocchi mentioned how to achieve the same thing and no clear
>> answer from the Gnocchi maintainers.
>>
>> It'd be much appreciated if you could find the answer in the end, or
>> there will be someone who has the already solved the issue.
>>
>> Best regards,
>> Lingxian Kong
>> Catalyst Cloud
>>
>>
>> On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch at gmail.com
>> <mailto:berndbausch at gmail.com>> wrote:
>>
>> The message at the end of this email is some three months old. I
>> have the same problem. The question is: *How to use the new rate
>> metrics in Gnocchi. *I am using a Stein Devstack for my tests.*
>> *
>>
>> For example, I need the CPU rate, formerly named /cpu_util/. I
>> created a new archive policy that uses /rate:mean/ aggregation
>> and has a 1 minute granularity:
>>
>> $ gnocchi archive-policy show ceilometer-medium-rate
>> +---------------------+------------------------------------------------------------------+
>> | Field | Value |
>> +---------------------+------------------------------------------------------------------+
>> | aggregation_methods | rate:mean, mean |
>> | back_window | 0 |
>> | definition | - points: 10080, granularity: 0:01:00,
>> timespan: 7 days, 0:00:00 |
>> | name | ceilometer-medium-rate |
>> +---------------------+------------------------------------------------------------------+
>>
>> I added the new policy to the publishers in /pipeline.yaml/:
>>
>> $ tail -n5 /etc/ceilometer/pipeline.yaml
>> sinks:
>> - name: meter_sink
>> publishers:
>> -
>> gnocchi://?archive_policy=medium&filter_project=gnocchi_swift
>> *-
>> gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
>>
>> After restarting all of Ceilometer, my hope was that the CPU rate
>> would magically appear in the metric list. But no: All metrics
>> are linked to archive policy /medium/, and looking at the details
>> of an instance, I don't detect anything rate-related:
>>
>> $ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11
>> +-----------------------+---------------------------------------------------------------------+
>> | Field | Value |
>> +-----------------------+---------------------------------------------------------------------+
>> ...
>> | metrics | compute.instance.booting.time:
>> 76fac1f5-962e-4ff2-8790-1f497c99c17d |
>> | | cpu: af930d9a-a218-4230-b729-fee7e3796944 |
>> | | disk.ephemeral.size:
>> 0e838da3-f78f-46bf-aefb-aeddf5ff3a80 |
>> | | disk.root.size:
>> 5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e |
>> | | memory.resident:
>> 09efd98d-c848-4379-ad89-f46ec526c183 |
>> | | memory.swap.in
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__memory.swap.in&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=wDnZesKE356cMfbQrJMuwYwdEof7ULmQOFQgqE31umo&e=>:
>> 1bb4bb3c-e40a-4810-997a-295b2fe2d5eb |
>> | | memory.swap.out:
>> 4d012697-1d89-4794-af29-61c01c925bb4 |
>> | | memory.usage:
>> 93eab625-0def-4780-9310-eceff46aab7b |
>> | | memory:
>> ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 |
>> | | vcpus:
>> e1c5acaf-1b10-4d34-98b5-3ad16de57a98 |
>> | original_resource_id | ae3659d6-8998-44ae-a494-5248adbebe11 |
>> ...
>>
>> | type | instance |
>> | user_id | a9c935f52e5540fc9befae7f91b4b3ae |
>> +-----------------------+---------------------------------------------------------------------+
>>
>> Obviously, I am missing something. Where is the missing link?
>> What do I have to do to get CPU usage rates? Do I have to create
>> metrics? Do//I have to ask Ceilometer to create metrics? How?
>>
>> Right now, no instructions seem to exist at all. If that is
>> correct, I would be happy to write documentation once I
>> understand how it works.
>>
>> Thanks a lot.
>>
>> Bernd
>>
>> On 5/10/2019 3:49 PM, info at dantalion.nl
>> <mailto:info at dantalion.nl> wrote:
>>> Hello,
>>>
>>> I am working on Watcher and we are currently changing how metrics are
>>> retrieved from different datasources such as Monasca or Gnocchi. Because
>>> of this major overhaul I would like to validate that everything is
>>> working correctly.
>>>
>>> Almost all of the optimization strategies in Watcher require the cpu
>>> utilization of an instance as metric but with newer versions of
>>> Ceilometer this has become unavailable.
>>>
>>> On IRC I received the information that Gnocchi could be used to
>>> configure an aggregate and this aggregate would then report cpu
>>> utilization, however, I have been unable to find documentation on how to
>>> achieve this.
>>>
>>> I was also notified that cpu_util is something that could be computed
>>> from other metrics. When reading
>>> https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.html#openstack-compute <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_ceilometer_rocky_admin_telemetry-2Dmeasurements.html-23openstack-2Dcompute&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=-ncji0Wl7WScsqBfumudi0ot_et_UIRfjh2c464FYWY&e=>
>>> the documentation seems to agree on this as it states that cpu_util is
>>> measured by using a 'rate of change' transformer. But I have not been
>>> able to find how this can be computed.
>>>
>>> I was hoping someone could spare the time to provide documentation or
>>> information on how this currently is best achieved.
>>>
>>> Kind Regards,
>>> Corne Lukken (Dantali0n)
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190801/2def53b1/attachment-0001.html>
More information about the openstack-discuss
mailing list