[openstack-dev] [TripleO] Undercloud Ceilometer

Clint Byrum clint at fewbar.com
Fri Oct 4 16:08:32 UTC 2013


Excerpts from Ladislav Smola's message of 2013-10-04 08:28:22 -0700:
> Hello,
> 
> just a few words about role of Ceilometer in the Undercloud and the work 
> in progress.
> 
> Why we need Ceilometer in Undercloud:
> ---------------------------------------------------
> 
> In Tuskar-UI, we will display number of statistics, that will show 
> Undercloud metrics.
> Later also number of alerts and notifications, that will come from 
> Ceilometer.
> 
> But I do suspect, that the Heat will use the Ceilometer Alarms, similar 
> way it is using it for
> auto-scaling in Overcloud. Can anybody confirm?

I have not heard of anyone want to "auto scale" baremetal for the
purpose of scaling out OpenStack itself. There is certainly a use case
for it when we run out of compute resources and happen to have spare
hardware around. But unlike on a cloud where you have several
applications all contending for the same hardware, in the undercloud we
have only one application, so it seems less likely that auto-scaling
will be needed. We definitely need "scaling", but I suspect it will not
be extremely elastic.

What will be needed, however, is metrics for the rolling updates feature
we plan to add to Heat. We want to make sure that a rolling update does
not adversely affect the service level of the running cloud. If we're
early in the process with our canary-based deploy and suddenly CPU load is
shooting up on all of the completed nodes, something, perhaps Ceilometer,
should be able to send a signal to Heat, and trigger a rollback.

> 
> What is planned in near future
> ---------------------------------------
> 
> The Hardware Agent capable of obtaining statistics:
> https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
> It uses SNMP inspector for obtaining the stats. I have tested that with 
> the Devtest tripleo setup
> and it works.
> 
> The planned architecture is to have one Hardware Agent(will be merged to 
> central agent code)
> placed on Control Node (or basically anywhere). That agent will poll 
> SNMP daemons placed on
> hardware in the Undercloud(baremetals, network devices). Any objections 
> why this is a bad idea?
> 
> We will have to create a Ceilometer Image element, snmpd element is 
> already there, but we should
> test it. Anybody volunteers for this task? There will be a hard part: 
> doing the right configurations.
> (firewall, keystone, snmpd.conf) So it's all configured in a clean and a 
> secured way. That would
> require a seasoned sysadmin to at least observe the thing. Any 
> volunteers here? :-)
> 
> The IPMI inspector for Hardware agent just started:
> https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
> Seems it should query the Ironic API, which would provide the data 
> samples. Any objections?
> Any volunteers for implementing this on Ironic side?
> 
> devananda and lifeless had a greatest concern about the scalability of a 
> Central agent. The Ceilometer
> is not doing any scaling right now, but they are planning Horizontal 
> scaling of the central agent
> for the future. So this is a very important task for us, for larger 
> deployments. Any feedback about
> scaling? Or changing of architecture for better scalability?
> 

I share their concerns. For < 100 nodes it is no big deal. But centralized
monitoring has a higher cost than distributed monitoring. I'd rather see
agents on the machines themselves do a bit more than respond to polling
so that load is distributed as much as possible and non-essential
network chatter is reduced.

I'm extremely interested in the novel approach that Assimilation
Monitoring [1] is taking to this problem, which is to have each node
monitor itself and two of its immediate neighbors on a switch and
some nodes monitor an additional node on a different switch. Failures
are reported to an API server which uses graph database queries to
determine at what level the failure occurred (single node, cascading,
or network level).

If Ceilometer could incorporate that type of light-weight high-scale
monitoring ethos, rather than implementing something we know does not
scale well at the level of scale OpenStack needs to be, I'd feel a lot
better about pushing it out as part of the standard deployment.

[1] http://assimmon.org/



More information about the OpenStack-dev mailing list