[openstack-dev] [Ceilometer] [TripleO] adding process/service monitoring

Clint Byrum clint at fewbar.com
Tue Jan 28 14:38:42 UTC 2014


Excerpts from Richard Su's message of 2014-01-27 17:59:34 -0800:
> Hi,
> 
> I have been looking into how to add process/service monitoring to
> tripleo. Here I want to be able to detect when an openstack dependent
> component that is deployed on an instance has failed. And when a failure
> has occurred I want to be notified and eventually see it in Tuskar.
> 
> Ceilometer doesn't handle this particular use case today. So I have been
> doing some research and there are many options out there that provides
> process checks: nagios, sensu, zabbix, and monit. I am a bit wary of
> pulling one of these options into tripleo. There is some increased
> operational and maintenance costs when pulling in each of them. And
> physical device monitoring is currently in the works for Ceilometer
> lessening the need for some of the other abilities that an another
> monitoring tool would provide.
> 
> For the particular use case of monitoring processes/services, at a high
> level, I am considering writing a simple daemon to perform the check.
> Checks and failures are written out as messages to the notification bus.
> Interested parties like Tuskar or Ceilometer can subscribe to these
> messages.
> 
> In general does this sound like a reasonable approach?

Writing a new one, no. But using notifications in OpenStack: yes!

I suggest finding the simplest one possible and teaching it to send
OpenStack notifications.

> 
> There is also the question of how to configure or figure out which
> processes we are interested in monitoring. I need to do more research
> here but I'm considering either looking at the elements listed by
> diskimage-builder or by looking at the orc post-configure.d scripts to
> find service that are restarted.
> 

There are basically two things you need to look for: things listening,
and things connected to rabbitmq/qpid.

So one crazy way to find things to monitor is to look at netstat or ss
and just monitor processes doing one of those things. I believe
assimilation monitoring's nanoprobe daemon already has the listening
part done:

http://techthoughts.typepad.com/managing_computers/2012/10/zero-configuration-discovery-and-server-monitoring-in-the-assimilation-monitoring-project.html

Also you may want to do two orc scripts in post-configure.d:

00-disruption-coming-stop-process-monitor
99-all-clear-start-process-monitor

Anyway, as Robert says, just keep it modular so that orgs that already
have a rich set of tools for this will be able to replace it.



More information about the OpenStack-dev mailing list