Open Stack

Mon Aug 5 13:23:24 UTC 2013

Hey Julien,

On 8/5/13 3:14 AM, "Julien Danjou" <julien at danjou.info> wrote:

>On Fri, Aug 02 2013, Thomas Maddox wrote:
>
>Hi Thomas,
>
>> I've been poking around to get an understanding of what some of these
>> default meters mean in the course of researching this Glance bug
>> (https://bugs.launchpad.net/ceilometer/+bug/1201701). I was wondering if
>> anyone could explain to me what the instance meter is. The unit
>>'instance'
>> sort of confuses me when each one of these meters is tied to a single
>> resource (instance), especially because it looks like a count of all
>> notifications regarding a particular instance that hit the bus. Here's
>>some
>> output for one of the instances I spun up:
>> http://paste.openstack.org/show/42963/.
>
>Are you talking about instance:m1.nano like counters?
>These are old counters we introduce a while back to count directly the
>number of instances by summing up all these counters. That's something
>that should now be done via the API -- I'll look into removing them.
>
>About the general instance counter, that's just a gauge counter which
>has always value = 1, counting the number of 'instance' on a particular
>resource, in this case, the instance itself. So it's just some sort of a
>heartbeat counting instances.

I was talking about both of them. Okay, so it is just detecting activity
and existence.

>
>> Another concern I have is I think I
>> may have found another bug, because I can delete the instance shown in
>>this
>> paste, and it still has a resource state description of 'scheduling'
>>long
>> after it's been deleted: http://paste.openstack.org/show/42962/, much
>>like
>> the Glance issue I'm currently working on.
>
>Then you use resource-show, Ceilometer just returns the latest metadata
>it has about the resource. So this should be equal to the
>resource_metadata field of the more recent samples it has in this
>database (hint: you could go and check this out in the db yourself to be
>sure).
>
>Now, 2 options:
>- the latest sample retrieved by Ceilometer shows differents metadata,
>  so there's a bug in Ceilometer
>- the latests sample retrieve by Ceilometer shows the same information,
>  so:
>   a. a message arrived late and out of order to Ceilometer, so the
>   resource metadata is oudated -- we can't do much about it
>   b. this is actually what Nova sends to Ceilometer -- much likely a
>   bug in Nova.

That was my thinking too. Judging by it being a scheduling event after the
instance was well into the active state, I would be more inclined to think
the first option is the case for the described bug(s).

Thinking about it, the latter option seems to describe a very real concern
going forward that didn't occur to me when I was wandering around the
code. Specifically regarding option 2a, if message 2 arrives at CM before
message 1 because it ended up on a faster route, then message 1 will
overwrite the metadata from message 2 and we record an incorrect state.
Isn't the nature of network comms for messages at the application layer to
potentially be out of order and in the case of UDP, even lost? What is the
leftover purpose of resource-show when we can't trust its output to
represent the actual state of whatever resource is in question? It seems
that timestamps could be used to prevent overwriting of the latest state
by checking that the incoming notification doesn't have a timestamp less
than the already recorded one. I hope I'm not seeing a problem that
doesn't exist here or misunderstanding something. If so, please correct me!

Thanks again for the help! :)

-Thomas

>
>-- 
>Julien Danjou
>;; Free Software hacker ; freelance consultant
>;; http://julien.danjou.info

Open Stack

[openstack-dev] [Ceilometer] Looking for some help understanding default meters

OpenStack

Community

Documentation

Branding & Legal