Re: AW: [telemetry][ceilometer][gnocchi] How to configure aggregate for cpu_util or calculate from metrics

1 Aug 2019

      I have a solution. At least it works for me. Be aware that this is 
Devstack, but I think nothing I did to solve my problem is 
Devstack-specific. Also, I don't know whether there are more efficient 
or canonical ways to reconfigure Ceilometer. But it's good enough for me.

These are my steps - you may not need all of them.

  * in *pipeline.yaml*, set publisher to gnocchi://
  * in *the resource definition file*, define my new archive policy.
    By default, this file resides in the Ceilometer source tree
    .../ceilometer/publisher/data/gnocchi_resources.yaml, but you can
    use config parameter resources_definition_file to change the default
    (I didn't try).
    Example:

         - name: ceilometer-medium-rate
           aggregation_methods:
           - mean
           - rate:mean
          back_window: 0
          definition:
            - granularity: 1 minute
              timespan: 7 days
            - granularity: 1 hour
              timespan: 365 days

  * in the same resource definition file, *adjust the archive policy *of
    rate metrics.
    Example:

        - resource_type: instance
          metrics:
          ...
            cpu:
              archive_policy_name: ceilometer-medium-rate

  * *delete all existing metrics and resources *from Gnocchi
    Probably only necessary when Ceilometer is running, and not needed
    if you reconfigure it before its first start.
    This is a drastic measure, but if you do it at the beginning of a
    deployment, it won't cause loss of much data.
    Why is this required? A metric contains an archive policy that can't
    be changed. Thus existing metrics need to be recreated.
    Why remove resources? Because they reference the metrics that I removed.

  * *restart all Ceilometer services*
    This is required for re-reading the pipeline and the resource
    definition files.
    Ceilometer will create resources and metrics as needed when it sends
    its samples to Gnocchi.

I tested this by running a CPU hogging instance and listing its measures 
after a few minutes:

     gnocchi measures show --resource f28f6b78-9dd5-49cc-a6ac-28cb14477bf0
                           --aggregation rate:mean cpu

+---------------------------+-------------+---------------+
     | timestamp                 | granularity |         value |
     +---------------------------+-------------+---------------+
     | 2019-08-01T20:23:00+09:00 |        60.0 |  1810000000.0 |
     | 2019-08-01T20:24:00+09:00 |        60.0 | 39940000000.0 |
     | 2019-08-01T20:25:00+09:00 |        60.0 | 40110000000.0 |

This means that the instance accumulated 39940000000 nanoseconds of CPU 
time in the 60 seconds at
20:24:00. Note that the old /cpu_util /was expressed in percent, so that 
Aodh alarms and Heat autoscaling definitions must be adapted.

Good luck. Hire me as Ceilometer consultant if you get stuck :)

Bernd

On 8/1/2019 6:11 PM, Teckelmann, Ralf, NMU-OIP wrote:
...
Hello Bernd, Hello Lingxian,
+1
You are not alone in your fruitless endeavor. Sadly, I can not come up 
with a solution.
We are stuck at the same point.
Maybe some day a dedicated member of the OpenStack community give the 
ceilometer guys a push to explain their service.
For us, also using Stein, it is in the state of "not production ready".
Cheers,
Ralf T.
------------------------------------------------------------------------
*Von:* Bernd Bausch <berndbausch@gmail.com>
*Gesendet:* Donnerstag, 1. August 2019 03:16:25
*An:* Lingxian Kong <anlin.kong@gmail.com>
*Cc:* openstack-discuss <openstack-discuss@lists.openstack.org>
*Betreff:* Re: [telemetry][ceilometer][gnocchi] How to configure 
aggregate for cpu_util or calculate from metrics
Lingxian,
Thanks for "bumping" my request and keeping it alive. The reason I 
need an answer: I am updating courseware to Stein that includes 
autoscaling based on CPU and disk I/O rates. Looks like I am "cutting 
edge" :)
I don't think the problem is in the Gnocchi camp, but rather 
Ceilometer. To store rates of measures in z, the following is needed:
* A /metric/. Raw measures are sent to the metric.
  * An /archive policy/. The metric has an archive policy.
  * The archive policy includes one or more /rate aggregates/
My cloud has archive policies with rate aggregates, but the question 
is about the first bullet: *How can I configure Ceilometer so that it 
creates the corresponding metrics and sends measures to them. *In 
other words, how is Ceilometer's output connected to my archive 
policy. From my experience, just adding the archive policy to 
Ceilometer's publishers is not sufficient.
Ceilometer's source code includes 
/.../publisher/data/gnocchi_resources.yaml/, which might well be the 
place where this can be configured. I am not sure how to do it though, 
and this file is not documented. I can read the source, but my 
developer skills are insufficient for understanding how everything 
fits together.
Bernd
On 8/1/2019 9:01 AM, Lingxian Kong wrote:
...
Hi Bernd,
There were a lot of people asked the same question before, 
unfortunately, I don't know the answer either(we are still using an 
old version of Ceilometer). The original cpu_util support has been 
removed from Ceilometer in favor of Gnocchi, but AFAIK, there is no 
doc in Gnocchi mentioned how to achieve the same thing and no clear 
answer from the Gnocchi maintainers.
It'd be much appreciated if you could find the answer in the end, or 
there will be someone who has the already solved the issue.
Best regards,
Lingxian Kong
Catalyst Cloud
On Wed, Jul 31, 2019 at 1:28 PM Bernd Bausch <berndbausch@gmail.com 
<mailto:berndbausch@gmail.com>> wrote:
The message at the end of this email is some three months old. I
    have the same problem. The question is: *How to use the new rate
    metrics in Gnocchi. *I am using a Stein Devstack for my tests.*
    *
For example, I need the CPU rate, formerly named /cpu_util/. I
    created a new archive policy that uses /rate:mean/ aggregation
    and has a 1 minute granularity:
$ gnocchi archive-policy show ceilometer-medium-rate
    +---------------------+------------------------------------------------------------------+
    | Field               | Value |
    +---------------------+------------------------------------------------------------------+
    | aggregation_methods | rate:mean, mean |
    | back_window         | 0 |
    | definition          | - points: 10080, granularity: 0:01:00,
    timespan: 7 days, 0:00:00 |
    | name                | ceilometer-medium-rate |
    +---------------------+------------------------------------------------------------------+
I added the new policy to the publishers in /pipeline.yaml/:
$ tail -n5 /etc/ceilometer/pipeline.yaml
    sinks:
        - name: meter_sink
          publishers:
              -
    gnocchi://?archive_policy=medium&filter_project=gnocchi_swift
    *-
    gnocchi://?archive_policy=ceilometer-medium-rate&filter_project=gnocchi_swift*
After restarting all of Ceilometer, my hope was that the CPU rate
    would magically appear in the metric list. But no: All metrics
    are linked to archive policy /medium/, and looking at the details
    of an instance, I don't detect anything rate-related:
$ gnocchi resource show ae3659d6-8998-44ae-a494-5248adbebe11
    +-----------------------+---------------------------------------------------------------------+
    | Field                 | Value |
    +-----------------------+---------------------------------------------------------------------+
    ...
    | metrics               | compute.instance.booting.time:
    76fac1f5-962e-4ff2-8790-1f497c99c17d |
    |                       | cpu: af930d9a-a218-4230-b729-fee7e3796944 |
    |                       | disk.ephemeral.size:
    0e838da3-f78f-46bf-aefb-aeddf5ff3a80           |
    |                       | disk.root.size:
    5b971bbf-e0de-4e23-ba50-a4a9bf7dfe6e |
    |                       | memory.resident:
    09efd98d-c848-4379-ad89-f46ec526c183               |
    |                       | memory.swap.in
    <https://urldefense.proofpoint.com/v2/url?u=http-3A__memory.swap.in&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=wDnZesKE356cMfbQrJMuwYwdEof7ULmQOFQgqE31umo&e=>:
    1bb4bb3c-e40a-4810-997a-295b2fe2d5eb |
    |                       | memory.swap.out:
    4d012697-1d89-4794-af29-61c01c925bb4               |
    |                       | memory.usage:
    93eab625-0def-4780-9310-eceff46aab7b |
    |                       | memory:
    ea8f2152-09bd-4aac-bea5-fa8d4e72bbb1 |
    |                       | vcpus:
    e1c5acaf-1b10-4d34-98b5-3ad16de57a98 |
    | original_resource_id  | ae3659d6-8998-44ae-a494-5248adbebe11 |
    ...
| type                  | instance |
    | user_id               | a9c935f52e5540fc9befae7f91b4b3ae |
    +-----------------------+---------------------------------------------------------------------+
Obviously, I am missing something. Where is the missing link?
    What do I have to do to get CPU usage rates? Do I have to create
    metrics? Do//I have to ask Ceilometer to create metrics? How?
Right now, no instructions seem to exist at all. If that is
    correct, I would be happy to write documentation once I
    understand how it works.
Thanks a lot.
Bernd
On 5/10/2019 3:49 PM, info@dantalion.nl
    <mailto:info@dantalion.nl> wrote:
...
Hello,
I am working on Watcher and we are currently changing how metrics are
    retrieved from different datasources such as Monasca or Gnocchi. Because
    of this major overhaul I would like to validate that everything is
    working correctly.
Almost all of the optimization strategies in Watcher require the cpu
    utilization of an instance as metric but with newer versions of
    Ceilometer this has become unavailable.
On IRC I received the information that Gnocchi could be used to
    configure an aggregate and this aggregate would then report cpu
    utilization, however, I have been unable to find documentation on how to
    achieve this.
I was also notified that cpu_util is something that could be computed
    from other metrics. When reading
    https://docs.openstack.org/ceilometer/rocky/admin/telemetry-measurements.htm...  <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_ceilometer_rocky_admin_telemetry-2Dmeasurements.html-23openstack-2Dcompute&d=DwMDaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=pnr97rQYDOFbG5UeNvvK1DDoP0YecUmqLwRt4SI4wOU&s=-ncji0Wl7WScsqBfumudi0ot_et_UIRfjh2c464FYWY&e=>
    the documentation seems to agree on this as it states that cpu_util is
    measured by using a 'rate of change' transformer. But I have not been
    able to find how this can be computed.
I was hoping someone could spare the time to provide documentation or
    information on how this currently is best achieved.
Kind Regards,
    Corne Lukken (Dantali0n)