[openstack-dev] [TripleO] Undercloud Ceilometer

Ladislav Smola lsmola at redhat.com
Mon Oct 7 08:54:59 UTC 2013


Hello Clint,

thank you for your feedback.

On 10/04/2013 06:08 PM, Clint Byrum wrote:
> Excerpts from Ladislav Smola's message of 2013-10-04 08:28:22 -0700:
>> Hello,
>>
>> just a few words about role of Ceilometer in the Undercloud and the work
>> in progress.
>>
>> Why we need Ceilometer in Undercloud:
>> ---------------------------------------------------
>>
>> In Tuskar-UI, we will display number of statistics, that will show
>> Undercloud metrics.
>> Later also number of alerts and notifications, that will come from
>> Ceilometer.
>>
>> But I do suspect, that the Heat will use the Ceilometer Alarms, similar
>> way it is using it for
>> auto-scaling in Overcloud. Can anybody confirm?
> I have not heard of anyone want to "auto scale" baremetal for the
> purpose of scaling out OpenStack itself. There is certainly a use case
> for it when we run out of compute resources and happen to have spare
> hardware around. But unlike on a cloud where you have several
> applications all contending for the same hardware, in the undercloud we
> have only one application, so it seems less likely that auto-scaling
> will be needed. We definitely need "scaling", but I suspect it will not
> be extremely elastic.

Yeah that's probably true. What I had in mind was something like
suspending hardware, that is no used at the time and e.g. have no
VM's running inside, for energy saving. And start it again when
we run out of compute resources, as you say.

> What will be needed, however, is metrics for the rolling updates feature
> we plan to add to Heat. We want to make sure that a rolling update does
> not adversely affect the service level of the running cloud. If we're
> early in the process with our canary-based deploy and suddenly CPU load is
> shooting up on all of the completed nodes, something, perhaps Ceilometer,
> should be able to send a signal to Heat, and trigger a rollback.

That is how Alarms should work now, you will just define the Alarm
inside of the Heat template, check the example:
https://github.com/openstack/heat-templates/blob/master/cfn/F17/AutoScalingCeilometer.yaml

>> What is planned in near future
>> ---------------------------------------
>>
>> The Hardware Agent capable of obtaining statistics:
>> https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
>> It uses SNMP inspector for obtaining the stats. I have tested that with
>> the Devtest tripleo setup
>> and it works.
>>
>> The planned architecture is to have one Hardware Agent(will be merged to
>> central agent code)
>> placed on Control Node (or basically anywhere). That agent will poll
>> SNMP daemons placed on
>> hardware in the Undercloud(baremetals, network devices). Any objections
>> why this is a bad idea?
>>
>> We will have to create a Ceilometer Image element, snmpd element is
>> already there, but we should
>> test it. Anybody volunteers for this task? There will be a hard part:
>> doing the right configurations.
>> (firewall, keystone, snmpd.conf) So it's all configured in a clean and a
>> secured way. That would
>> require a seasoned sysadmin to at least observe the thing. Any
>> volunteers here? :-)
>>
>> The IPMI inspector for Hardware agent just started:
>> https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
>> Seems it should query the Ironic API, which would provide the data
>> samples. Any objections?
>> Any volunteers for implementing this on Ironic side?
>>
>> devananda and lifeless had a greatest concern about the scalability of a
>> Central agent. The Ceilometer
>> is not doing any scaling right now, but they are planning Horizontal
>> scaling of the central agent
>> for the future. So this is a very important task for us, for larger
>> deployments. Any feedback about
>> scaling? Or changing of architecture for better scalability?
>>
> I share their concerns. For < 100 nodes it is no big deal. But centralized
> monitoring has a higher cost than distributed monitoring. I'd rather see
> agents on the machines themselves do a bit more than respond to polling
> so that load is distributed as much as possible and non-essential
> network chatter is reduced.

Right now, for the central agent, it should be matter of configuration.
So you can set one central agent, fetching all baremetals from nova. Or
You can bake the central agent to each baremetal and set it to poll only
from localhost. Or one of distributed architecture, that is planned as
configuration option, is having node (Management Leaf node), that is
managing bunch of hardware, so the Central agent could be baked into it.

What the agent does then, is process the data, pack it into message
and send it to openstack message bus (should be heavily scalable) where
it is collected by a Collector (should be able to have many workers) and 
saved
to database.

>
> I'm extremely interested in the novel approach that Assimilation
> Monitoring [1] is taking to this problem, which is to have each node
> monitor itself and two of its immediate neighbors on a switch and
> some nodes monitor an additional node on a different switch. Failures
> are reported to an API server which uses graph database queries to
> determine at what level the failure occurred (single node, cascading,
> or network level).
>
> If Ceilometer could incorporate that type of light-weight high-scale
> monitoring ethos, rather than implementing something we know does not
> scale well at the level of scale OpenStack needs to be, I'd feel a lot
> better about pushing it out as part of the standard deployment.
>
> [1] http://assimmon.org/

That does seems interesting. But seems like very long term plan as
it will be non-trivial to implement it.
I guess the first step would be to get some graph database into
Ceilometer. Not sure about the firewall setup in the network then,
because right now, the hardware is not allowed to talk with each
other, at least I think. Would be great to talk about this with tripleo
guys. This seems like a nice monitoring option for very large
deployments.

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Ladislav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131007/0a373d2e/attachment.html>


More information about the OpenStack-dev mailing list