Open Stack

Thu Nov 15 17:01:05 UTC 2012

Hi Sandy,

> Hmm, I'm not sure if this post is intended to be a reply to my
> previous post about stacktach-ceilometer integration or not

Well, its in the same vein, but I guess a bit less ambitious in
scope.

> My biggest concern is there is still no differentiation between
> instrumentation and metering/monitoring in this solution. It sounds
> like we are still mixing requirements when these are two very
> different animals. Our solution for instrumentation
> (fast/frequent/small/unreliable) will quite likely have to be
> different from our usage/monitoring solution (large/slower/reliable)

Yes that's a fair point, I'm approaching this mainly with the
latter in mind, because that's currently what ceilometer is focused
on.

So I should have clarified earlier, this mechanism was intended as a
replacement for the approach previously taken by the ceilo compute
agent to determine detailed usage information from the nova DB &
libvirt, not necessarily as the "one true way" that all measurement
activities need to be approached for nova.

> I'm going to leave the instrumentation discussion for a different
> thread and focus on lifecycle/billing/usage/monitoring (call it what
> you like). Also, I'm not sure what the "ceilo message bus" is, so
> I'm going to assume it's the notification queues?

Yes, you're correct in that assumption. Loose use of language from
me, I just was seeking to distinguish the ceilo metering messages 
flowing over AMQP from the notifications currently produced directly
by nova.

> We don't *have* to use rabbit to handle the notifications. The
> notification system could easily be extended to allow different
> event types to use different notifiers. For example, billing events
> could go to rabbit while monitoring/lifecycle events could go to a
> log file. Or, if we wanted to introduce a debugging event, that
> could send to statsd (or something) directly as it strips out the
> cruft from the message.

Yes, so in ceilometer we've been making small steps towards that
idea with the concept of multiple publishers and transformers.
So a transformer would know how to distill what a publisher needs
from the raw data (strip out the cruft & massage into the expected
format) and then the publisher knows how to emit the data via
some conduit (over rabbit, or a UDP packet for stats, an RRD file,
a CloudWatch PutMetricData call, etc.).

So I'm thinking we're not a million miles from each other on that
score, other than I had been assuming the publishers would live
in ceilo, and it would understand their requirements in terms of
cadence etc.

Were you more thinking of this logic living elsewhere?

> So, the messages that we are interested in are larger/less
> frequent/less time sensitive (within reason) and very important.

Are you contrasting with instrumentation data there?

Certainly there would still be some time sensitivity, particularly for
metrics feeding into near-realtime monitoring. So for metering feeding
into non-realtime consumers (just as billing), we can tolerate a bit
of irregularity in the cadence and some delays in the pipeline, as
long as we maintain completeness. Whereas for monitoring, we need to
get at that data while its still fresh and ensure its sampled at a
near-constant rate.

> Also, they are ancillary to the task at hand (providing a cloud
> service) so their failure should not bring down the system. Which is
> why a queue-based approach seems the logical choice. Having nova
> call out seems wrong and if it did, it belongs as a new rabbit
> notifier where the person deploying that solution takes all
> responsibility.

True that, we certainly need to be cognizant of the load imposed
on a possibly degraded system potentially making things worse.
Hence the leeriness about garnering the info ceilo needs from
the public nova-api.

> The existing information gathered from the hypervisors could easily
> be extended with optional sections to cover all use cases. Much the
> same way MP3 and JPG has optional data blocks. Notifications do not
> use the Nova RPC protocol and should be versioned separately from
> it. The entire structure of the notification should be changed to
> allow for these "optional" blocks ... not only for flexibility, but
> to reduce the already monstrous payload size (do we need to do 2-3
> db accesses every time we send a notification?)

So with nova-compute losing it's direct database access, then 2-3 DB
accesses per notification is not going to be a runner - all the
information we're talking about extracting here will I think have to
be available from the hypervisor, possibly mixed in with some cached
data retrieved by ceilo from the nova-api (e.g. on every polling cycle
we wouldn't want to go back to the nova-api to figure out the instance
flavor name, if that's not directly exposed by nova-compute but is
needed for metering purposes).

I'm not sure if I've really addressed all your concerns there, please
shout if not.

Cheers,
Eoghan

Open Stack

[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

OpenStack

Community

Documentation

Branding & Legal