[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

Eoghan Glynn eglynn at redhat.com
Wed Nov 14 21:50:16 UTC 2012



> On 11/14/2012 12:28 PM, Eoghan Glynn wrote:
> > 1. Extend the existing os-server-diagnostics API extension to
> > expose
> >    any additional stats that ceilo needs.
> > 
> >    +  would allow the ceilo compute agent to be scaled
> >    independently
> >       of the nova-compute node (i.e. no need for a 1:1
> >       correspondence)
> >    -  the diagnostics returned are currently hypervisor-specific
> >    -  the additional nova-api-->nova-compute RPC call would add lag
> >       and impact timeliness for metrics gathering
> > 
> > 
> > 2. Call the nova get_diagnostics RPC directly (as per the
> > experimental
> >    patch proposed by Yunhong Jiang
> >    https://review.openstack.org/15952),
> >    or use a new RPC message specifically designed for this purpose.
> > 
> >    +/- as for #1, but also removes the lag involved in an
> >    additional
> >        hop between nova services
> >    -   calling RPC directly would expose ceilo to a much less
> >    stable
> >        (i.e. rapidly rev'd) API than would be the case for #1
> 
> > So the question is how the nova domain experts see these options
> > sizing
> > up?
> > 
> > Personally I'm liking option #2, aside from a lingering concern
> > about
> > how rapidly RPC versioning is rev'd (which suggests the more sedate
> > pace of API versioning would be easier to consume). Also some
> > statement
> > on whether RPC is envisaged as being externally-callable would be
> > good.
> 
> I'm in favor of option #1.  This is primarily because I have been
> considering all rpc APIs to be private internal nova APIs.

Right, that was pretty much my concern about RPC being externally-
callable. If RPC is considered a private API, then we really haven't
advanced much by switching from a direct dependency on the libvirt
driver.

Fair enough RPC is versioned, so that gives us some basic insulation
from unexpected changes. But if its not part of the external contract
exposed by nova, then we're still some way from the goal of using a
stable, supported API.
 
> The benefit of #2 over #1 appears to be a performance concern.  How
> sensitive to timing are these measurements?  Also, if they are very
> sensitive, it seems you'd be facing the same risk of problems due to
> delays using rpc directly, because the queues can certainly get
> backed
> up, so #2 doesn't help much.

So the concern would be that in extremis we have to be able to take
measurements on a degraded (possibly overloaded) system without making
things worse by imposing even more load or relying on multiple layers
being up and responsive.

But your point about the queues being swamped is well-made, so making
an end-run around the nova API layer may not even help that much.

So here's a random half-formed thought, suppose the nova-compute service
exposed a public REST API directly for purposes such as these? So that
diagnostics could be retrieved directly from the nova-compute nodes
without either involving nova-api or using the internal RPC mechanism.

Cheers,
Eoghan



More information about the OpenStack-dev mailing list