[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

Sandy Walsh sandy.walsh at RACKSPACE.COM
Thu Nov 15 21:04:34 UTC 2012


> From: Eoghan Glynn [eglynn at redhat.com]
> Sent: Thursday, November 15, 2012 1:01 PM
>
> Hi Sandy,

Hey! Thanks for the prompt reply :)

>> We don't *have* to use rabbit to handle the notifications. The
>> notification system could easily be extended to allow different
>> event types to use different notifiers. For example, billing events
>> could go to rabbit while monitoring/lifecycle events could go to a
>> log file. Or, if we wanted to introduce a debugging event, that
>> could send to statsd (or something) directly as it strips out the
>> cruft from the message.

>Yes, so in ceilometer we've been making small steps towards that
>idea with the concept of multiple publishers and transformers.
>So a transformer would know how to distill what a publisher needs
>from the raw data (strip out the cruft & massage into the expected
>format) and then the publisher knows how to emit the data via
>some conduit (over rabbit, or a UDP packet for stats, an RRD file,
>a CloudWatch PutMetricData call, etc.).
>
> So I'm thinking we're not a million miles from each other on that
> score, other than I had been assuming the publishers would live
> in ceilo, and it would understand their requirements in terms of
> cadence etc.
> 
> Were you more thinking of this logic living elsewhere?

So this should be a Ceilometer notifier, that lives in the Ceilometer code base and is a nova.conf --notification_driver setting for whomever deploys it. This implies there are two ways to get notifications out of Nova:
1. via the Rabbit Notifier with an external Worker/Consumer (preferred for monitoring/usage/billing)
2. via a specific Notifier (like https://github.com/openstack/nova/blob/master/nova/openstack/common/notifier/log_notifier.py)

Stuff such as converting format and varying polling rates seems to be something external to nova. Since the system will only issue notifications at the rate they occur. Higher sampling rates I think fall into the instrumentation category and should be dealt with separately. 

>> So, the messages that we are interested in are larger/less
>> frequent/less time sensitive (within reason) and very important.
>
> Are you contrasting with instrumentation data there?

yes, sorry, bad placement.

>Certainly there would still be some time sensitivity, particularly for
>metrics feeding into near-realtime monitoring. So for metering feeding
>into non-realtime consumers (just as billing), we can tolerate a bit
>of irregularity in the cadence and some delays in the pipeline, as
>long as we maintain completeness. Whereas for monitoring, we need to
>get at that data while its still fresh and ensure its sampled at a
>near-constant rate.

Sounds like instrumentation to me.

>> Also, they are ancillary to the task at hand (providing a cloud
>> service) so their failure should not bring down the system. Which is
>> why a queue-based approach seems the logical choice. Having nova
>> call out seems wrong and if it did, it belongs as a new rabbit
>> notifier where the person deploying that solution takes all
>> responsibility.

>True that, we certainly need to be cognizant of the load imposed
>on a possibly degraded system potentially making things worse.
>Hence the leeriness about garnering the info ceilo needs from
>the public nova-api.

The public api has no place for this stuff. I must have missed it, but where was that being proposed? Hitting HTTP for metrics is just wrong.

>> The existing information gathered from the hypervisors could easily
>> be extended with optional sections to cover all use cases. Much the
>> same way MP3 and JPG has optional data blocks. Notifications do not
>> use the Nova RPC protocol and should be versioned separately from
>> it. The entire structure of the notification should be changed to
>> allow for these "optional" blocks ... not only for flexibility, but
>> to reduce the already monstrous payload size (do we need to do 2-3
>> db accesses every time we send a notification?)

>So with nova-compute losing it's direct database access, then 2-3 DB
>accesses per notification is not going to be a runner - all the
>information we're talking about extracting here will I think have to
>be available from the hypervisor, possibly mixed in with some cached
>data retrieved by ceilo from the nova-api (e.g. on every polling cycle
>we wouldn't want to go back to the nova-api to figure out the instance
>flavor name, if that's not directly exposed by nova-compute but is
>needed for metering purposes).

The big requirement for it today is for network information, which is already being cached in the Quantum driver. If we can separate the network and the compute notifications I think we're ok. Likewise with storage. The downside is we wouldn't be getting these notifications as atomic updates and that can lead to race conditions. But, so long as the time stamps are accurate within reason (NTP), we should be ok there. If we're dropping events we've got bigger issues to deal with. 

Another possibility we're exploring is having a read-only mirror of the production database for this sort of stuff. That could be the "best practice" in these tight situations. But that's a story for another time :)

So, we need to revisit the notification format wrt versioning, structure, payload size, content and overhead. Getting the data out and doing something with it is easily do-able via a worker/consumer or a proprietary notifier (and with no impact on nova core). 

Next we need to be very clear on what is instrumentation and what is monitoring/usage/billing/lifecycle.

>I'm not sure if I've really addressed all your concerns there, please
>shout if not.

It's a good start :) Let's keep it going.

(And I agree, we're not too far off)

-S


Cheers,
Eoghan

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list