[openstack-dev] [nova][ceilometer] model for ceilo/nova interaction going forward

Angus Salkeld asalkeld at redhat.com
Sun Nov 18 22:32:31 UTC 2012


On 16/11/12 16:54 +0000, Sandy Walsh wrote:
>
>________________________________________
>From: Eoghan Glynn [eglynn at redhat.com]
>Sent: Friday, November 16, 2012 12:31 PM
>>> I would *never* assume anything in user space. I think monitoring of
>>> the users instances is out of scope for all things OpenStack. The
>>> users might deploy whatever tools they like for checking i/o, disk,
>>> network, etc.
>>
>>That's an interesting point of view, that I hadn't considered before.
>>
>>I would see AWS CloudWatch as a user-oriented monitoring service,
>>and would have assumed that there is scope for something similar
>>to be part of openstack.
>>
>>When you say its out of scope, do you mean that it shouldn't be
>>something addressed by the IaaS fabric, and should instead be
>>something that users bolt on top by running agents *within* their
>>instances?
>>
>>(as opposed to some piece of the openstack infrastructure doing
>> this monitoring from *outside* the instance)
>>
>>I recall the Heat team ran into some complications on this whole
>>within-versus-without question, as the openstack RBAC mechanisms
>>aren't yet flexible enough to easily grant in-instance agents
>>credentials giving limited roles such as the ability to report
>>metrics to a CW-like metricstore.
>
>Well, now we're getting to the crux of the matter and one that I brought up in the IRC meeting yesterday. My concern is that ceilometer is becoming a kitchen sink for this stuff. Two summits ago the mandate was clear "There is no billing, we need billing, so let's build this ..." (I remember because I suggested to look at Yagi/StackTach/Tach back then). This was also the impetus for the "integration" proposal as I saw the scope widening. To have ceilometer become a set of low-level tools that can be used to get data out and away from core OpenStack services and a basis for other tools to build upon (Heat, CloudWatch/Synaps, StackTach, etc). Tools of this sort are vitally important to a successful OpenStack deployment, but it should be mix-and-match or "your mileage may vary".

I don't think that approach helps users. When there is no clear official solution 
then it's up to the deployer to somehow go and find and evaluate these projects?
That doesn't make sense to me. I think we should be making it super easy for
people to deploy OpenStack. Of course this doesn't mean that it's the only
solution just an obvious one.

I understand your concern (re: putting too much in one project), but in this
case I am not sure it's so bad as the complexity of each is not great. If we
put billing and Monitoring/alarming into the same project it would not be
complex project.

>
>I think ceilometer should be a smaller, more tightly focused collection of utilities, vs. trying to be all things to all people.

I think a bigger project pulls in a bigger community and makes it more successful
in the long run.

>If a project like Heat runs into problems with something like the RBAC mechanism or the polling interval from Compute, that would be the Ceilometer teams job to broker a solution with Core and expose that solution to everyone.
>
>The Rackspace Linux/Windows agents for Xen are open sourced:
>https://github.com/rackspace/openstack-guest-agents-windows-xenserver
>https://github.com/rackspace/openstack-guest-agents-unix

It is not easy to get people to install guest agents that you want, it
is far easier to provide an api for them to use and an example implementation
of using that api. For some reason guest agents seem to be a sensitive issue.

So this is part of the api that CloudWatch provides to post stats back to the 
Monitoring service.


-Angus

>
>That might be a starting point for an agent-level api that can feed into that eco-system? But again, I think it's a separate problem ... perhaps even a separate project?
>
>>> For the Deployer of openstack, yes they will want to check if the
>>> Services are running hot, falling behind queued i/o or the public
>>> openstack api load balancer is spitting out 503's. My
>>> differentiation of instrumentation and monitoring in these cases are
>>> related to the sample rate and size of the payload. 503's, i/o, etc
>>> I would view as instrumentation. Sampled frequently and with little
>>> payload. This is the classic statsd/graphite data. Shown in a big
>>> graph on the wall of the Network Operations Center. Monitoring would
>>> be larger, slower, chunkier data for Capacity Planning, etc.
>>> Lifecycle state falls into the Monitoring camp as well. "Are things
>>> progressing as expected? Or will they be a problem down the road."
>>
>>OK, so this cloud-operator-oriented view is also crucial, but I'd
>>be leery about making this the entire focus of our efforts (i.e. by
>>assuming that users will sort themselves out with their own monitoring
>>solution).
>
>Yeah, it's a tricky line to cross ... stepping into user-space. There are a raft of 3rd party companies I'm sure would want to have a say in how this happens. So many OpenStack startups are centered around this very problem.
>
>>Cheers,
>>Eoghan
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list