[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

Sandy Walsh sandy.walsh at RACKSPACE.COM
Thu Nov 15 23:58:37 UTC 2012


From: Doug Hellmann [doug.hellmann at dreamhost.com]
Sent: Thursday, November 15, 2012 5:51 PM

On Thu, Nov 15, 2012 at 4:04 PM, Sandy Walsh <sandy.walsh at rackspace.com<mailto:sandy.walsh at rackspace.com>> wrote:

So this should be a Ceilometer notifier, that lives in the Ceilometer code base and is a nova.conf --notification_driver setting for whomever deploys it. This implies there are two ways to get notifications out of Nova:
1. via the Rabbit Notifier with an external Worker/Consumer (preferred for monitoring/usage/billing)
2. via a specific Notifier (like https://github.com/openstack/nova/blob/master/nova/openstack/common/notifier/log_notifier.py)

> We're talking specifically about values like disk I/O, CPU stats, etc. That data isn't generated as part of
>  a notification, that's why we're having to poll for it. What we're looking for is a solution that doesn't involve
>  ceilometer importing part of nova's code *in an unsupported way*, as it does now. Some of the options
>  presented involve new network-based communication between the existing ceilometer agent and the
>  compute agent (RPC or REST, in different directions). None of those is really optimal, because we don't
>  want to burden the compute agent with lots of calls asking for stats, either for metering or for monitoring. I
>  think the option the ceilometer team is favoring at the moment is making the hypervisor library in nova a
>  public API, so we can use it without fear of the API changing in unannounced ways. That would let us keep
>  the hypervisor polling in a separate daemon from the hypervisor management. There are details to work out
>  about how such a separate library would be implemented.

I don't know how some shops would feel about putting an api server on their compute nodes.

I'd use the same approach we use everywhere else in OpenStack, make the data collection portion of the hypervisor a plug-in. Each plug-in in the chain can add a new data section to the dictionary sent up for transmission. Out of the box we would send the basic stuff that is sent today. Other deployments might add some ceilometer/hypervisor specific modules to gather other things.

--collection_drivers=nova.virt.xen.Collector, ceilometer.xen.FunkyStuff
or, if you're a libvirt shop:
--collection_drivers=nova.virt.kvm.Collector, ceilometer.kvm.FunkyStuff, mylib.Grabbit

>So with nova-compute losing it's direct database access, then 2-3 DB
>accesses per notification is not going to be a runner - all the
>information we're talking about extracting here will I think have to
>be available from the hypervisor, possibly mixed in with some cached
>data retrieved by ceilo from the nova-api (e.g. on every polling cycle
>we wouldn't want to go back to the nova-api to figure out the instance
>flavor name, if that's not directly exposed by nova-compute but is
>needed for metering purposes).

The big requirement for it today is for network information, which is already being cached in the Quantum driver.

> Quantum's knowledge about network stats does not differentiate between internal and external traffic. It's
> still useful for monitoring, but metering stats have to be collected somewhere else.

Wouldn't the approach described above work in this case too? The caching currently used is for the IP address allocation (which nearly all notifications include, rightly or wrongly). I'd need to talk to the guys about how bandwidth usage is collected today. Stay tuned.

If we can separate the network and the compute notifications I think we're ok. Likewise with storage. The downside is we wouldn't be getting these notifications as atomic updates and that can lead to race conditions. But, so long as the time stamps are accurate within reason (NTP), we should be ok there. If we're dropping events we've got bigger issues to deal with.

Another possibility we're exploring is having a read-only mirror of the production database for this sort of stuff. That could be the "best practice" in these tight situations. But that's a story for another time :)

So, we need to revisit the notification format wrt versioning, structure, payload size, content and overhead. Getting the data out and doing something with it is easily do-able via a worker/consumer or a proprietary notifier (and with no impact on nova core).

> I'm not sure why any of that is necessary to solve this problem?

Well, you're talking about a different problem now ... so no, it's not necessary for that. But still needed overall imho :)

I was specifically talking about lifecycle notifications, in which case atomic snapshots of state are desired. Regardless, separating notifications for network, storage and compute would be generally good things I think.

-S

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121115/f1cb77a6/attachment.html>


More information about the OpenStack-dev mailing list