Open Stack

Wed Nov 14 17:28:41 UTC 2012

Folks,

TL;DR: soliciting feedback on the best (most stable/supportable)
approach for ceilometer to interact with nova going forward.

Currently ceilo both consumes notifications from nova (instance
lifecycle events & the like) and also periodically polls libvirt to
extract more detailed info. This latter mechanism uses internal nova
classes, so we want to move towards a model that is more stable and
supportable into the future.

We are also currently limited to libvirt, so it would make sense to
move towards a more hypervisor-agnostic position, or at least to
provide wider support.

Now, there are at least 4 different approaches that could be followed,
each with its own advantages and disadvantages, so I just wanted to
call these out so to solicit some feedback and guidance from the nova
domain experts ...

1. Extend the existing os-server-diagnostics API extension to expose
   any additional stats that ceilo needs.

   +  would allow the ceilo compute agent to be scaled independently
      of the nova-compute node (i.e. no need for a 1:1 correspondence)
   -  the diagnostics returned are currently hypervisor-specific
   -  the additional nova-api-->nova-compute RPC call would add lag
      and impact timeliness for metrics gathering

2. Call the nova get_diagnostics RPC directly (as per the experimental
   patch proposed by Yunhong Jiang https://review.openstack.org/15952),
   or use a new RPC message specifically designed for this purpose.

   +/- as for #1, but also removes the lag involved in an additional
       hop between nova services
   -   calling RPC directly would expose ceilo to a much less stable
       (i.e. rapidly rev'd) API than would be the case for #1

3. Have nova itself emit metering messages directly onto the ceilo
   message bus, encompassing both lifecycle events and usage stats,
   to be picked up and persisted by the ceilo collector or other agent.

   - leaks ceilo concerns into nova
   - requires message bus usage, probably inappropriate for time-
     sensitive measurements feeding into near-realtime metrics.

4. Invert control and have the nova compute service itself call into a
   ceilo-provided API that abstracts the conduit used for publication
   (could be via the message bus, or UDP, or a direct call to a CW API)

   - a loaded nova compute service may fall behind in this periodic
     task, especially if the reporting cadence is configured high

So the question is how the nova domain experts see these options sizing
up?

Personally I'm liking option #2, aside from a lingering concern about
how rapidly RPC versioning is rev'd (which suggests the more sedate
pace of API versioning would be easier to consume). Also some statement
on whether RPC is envisaged as being externally-callable would be good.

Thoughts/feedback most welcome ...

Thanks,
Eoghan

Open Stack

[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

OpenStack

Community

Documentation

Branding & Legal