<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hello Clint,<br>

      <br>

      thank you for your feedback.<br>

      <br>

      On 10/04/2013 06:08 PM, Clint Byrum wrote:<br>

    </div>

    <blockquote cite="mid:1380901722-sup-2733@clint-HP" type="cite">

      <pre wrap="">Excerpts from Ladislav Smola's message of 2013-10-04 08:28:22 -0700:

</pre>

      <blockquote type="cite">

        <pre wrap="">Hello,

just a few words about role of Ceilometer in the Undercloud and the work 

in progress.

Why we need Ceilometer in Undercloud:

---------------------------------------------------

In Tuskar-UI, we will display number of statistics, that will show 

Undercloud metrics.

Later also number of alerts and notifications, that will come from 

Ceilometer.

But I do suspect, that the Heat will use the Ceilometer Alarms, similar 

way it is using it for

auto-scaling in Overcloud. Can anybody confirm?

</pre>

      </blockquote>

      <pre wrap="">

I have not heard of anyone want to "auto scale" baremetal for the

purpose of scaling out OpenStack itself. There is certainly a use case

for it when we run out of compute resources and happen to have spare

hardware around. But unlike on a cloud where you have several

applications all contending for the same hardware, in the undercloud we

have only one application, so it seems less likely that auto-scaling

will be needed. We definitely need "scaling", but I suspect it will not

be extremely elastic.

</pre>

    </blockquote>

    <br>

    Yeah that's probably true. What I had in mind was something like<br>

    suspending hardware, that is no used at the time and e.g. have no <br>

    VM's running inside, for energy saving. And start it again when<br>

    we run out of compute resources, as you say.<br>

    <br>

    <blockquote cite="mid:1380901722-sup-2733@clint-HP" type="cite">

      <pre wrap="">

What will be needed, however, is metrics for the rolling updates feature

we plan to add to Heat. We want to make sure that a rolling update does

not adversely affect the service level of the running cloud. If we're

early in the process with our canary-based deploy and suddenly CPU load is

shooting up on all of the completed nodes, something, perhaps Ceilometer,

should be able to send a signal to Heat, and trigger a rollback.

</pre>

    </blockquote>

    <br>

    That is how Alarms should work now, you will just define the Alarm <br>

    inside of the Heat template, check the example:<br>

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    <a

href="https://github.com/openstack/heat-templates/blob/master/cfn/F17/AutoScalingCeilometer.yaml">https://github.com/openstack/heat-templates/blob/master/cfn/F17/AutoScalingCeilometer.yaml</a><br>

    <br>

    <blockquote cite="mid:1380901722-sup-2733@clint-HP" type="cite">

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">

What is planned in near future

---------------------------------------

The Hardware Agent capable of obtaining statistics:

<a class="moz-txt-link-freetext" href="https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices">https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices</a>

It uses SNMP inspector for obtaining the stats. I have tested that with 

the Devtest tripleo setup

and it works.

The planned architecture is to have one Hardware Agent(will be merged to 

central agent code)

placed on Control Node (or basically anywhere). That agent will poll 

SNMP daemons placed on

hardware in the Undercloud(baremetals, network devices). Any objections 

why this is a bad idea?

We will have to create a Ceilometer Image element, snmpd element is 

already there, but we should

test it. Anybody volunteers for this task? There will be a hard part: 

doing the right configurations.

(firewall, keystone, snmpd.conf) So it's all configured in a clean and a 

secured way. That would

require a seasoned sysadmin to at least observe the thing. Any 

volunteers here? :-)

The IPMI inspector for Hardware agent just started:

<a class="moz-txt-link-freetext" href="https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices">https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices</a>

Seems it should query the Ironic API, which would provide the data 

samples. Any objections?

Any volunteers for implementing this on Ironic side?

devananda and lifeless had a greatest concern about the scalability of a 

Central agent. The Ceilometer

is not doing any scaling right now, but they are planning Horizontal 

scaling of the central agent

for the future. So this is a very important task for us, for larger 

deployments. Any feedback about

scaling? Or changing of architecture for better scalability?

</pre>

      </blockquote>

      <pre wrap="">

I share their concerns. For < 100 nodes it is no big deal. But centralized

monitoring has a higher cost than distributed monitoring. I'd rather see

agents on the machines themselves do a bit more than respond to polling

so that load is distributed as much as possible and non-essential

network chatter is reduced.</pre>

    </blockquote>

    <br>

    Right now, for the central agent, it should be matter of

    configuration.<br>

    So you can set one central agent, fetching all baremetals from nova.

    Or<br>

    You can bake the central agent to each baremetal and set it to poll

    only<br>

    from localhost. Or one of distributed architecture, that is planned

    as <br>

    configuration option, is having node (Management Leaf node), that is<br>

    managing bunch of hardware, so the Central agent could be baked into

    it.<br>

    <br>

    What the agent does then, is process the data, pack it into message<br>

    and send it to openstack message bus (should be heavily scalable)

    where<br>

    it is collected by a Collector (should be able to have many workers)

    and saved<br>

    to database.<br>

    <br>

    <blockquote cite="mid:1380901722-sup-2733@clint-HP" type="cite">

      <pre wrap="">

I'm extremely interested in the novel approach that Assimilation

Monitoring [1] is taking to this problem, which is to have each node

monitor itself and two of its immediate neighbors on a switch and

some nodes monitor an additional node on a different switch. Failures

are reported to an API server which uses graph database queries to

determine at what level the failure occurred (single node, cascading,

or network level).

If Ceilometer could incorporate that type of light-weight high-scale

monitoring ethos, rather than implementing something we know does not

scale well at the level of scale OpenStack needs to be, I'd feel a lot

better about pushing it out as part of the standard deployment.

[1] <a class="moz-txt-link-freetext" href="http://assimmon.org/">http://assimmon.org/</a>

</pre>

    </blockquote>

    <br>

    That does seems interesting. But seems like very long term plan as<br>

    it will be non-trivial to implement it.<br>

    I guess the first step would be to get some graph database into <br>

    Ceilometer. Not sure about the firewall setup in the network then,<br>

    because right now, the hardware is not allowed to talk with each <br>

    other, at least I think. Would be great to talk about this with

    tripleo <br>

    guys. This seems like a nice monitoring option for very large <br>

    deployments.<br>

    <br>

    <blockquote cite="mid:1380901722-sup-2733@clint-HP" type="cite">

      <pre wrap="">

_______________________________________________

OpenStack-dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>

<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>

</pre>

    </blockquote>

    <br>

    Ladislav<br>

  </body>

</html>