[openstack-dev] [Ceilometer] [TripleO] adding process/service monitoring
Clint Byrum
clint at fewbar.com
Tue Jan 28 14:38:42 UTC 2014
Excerpts from Richard Su's message of 2014-01-27 17:59:34 -0800:
> Hi,
>
> I have been looking into how to add process/service monitoring to
> tripleo. Here I want to be able to detect when an openstack dependent
> component that is deployed on an instance has failed. And when a failure
> has occurred I want to be notified and eventually see it in Tuskar.
>
> Ceilometer doesn't handle this particular use case today. So I have been
> doing some research and there are many options out there that provides
> process checks: nagios, sensu, zabbix, and monit. I am a bit wary of
> pulling one of these options into tripleo. There is some increased
> operational and maintenance costs when pulling in each of them. And
> physical device monitoring is currently in the works for Ceilometer
> lessening the need for some of the other abilities that an another
> monitoring tool would provide.
>
> For the particular use case of monitoring processes/services, at a high
> level, I am considering writing a simple daemon to perform the check.
> Checks and failures are written out as messages to the notification bus.
> Interested parties like Tuskar or Ceilometer can subscribe to these
> messages.
>
> In general does this sound like a reasonable approach?
Writing a new one, no. But using notifications in OpenStack: yes!
I suggest finding the simplest one possible and teaching it to send
OpenStack notifications.
>
> There is also the question of how to configure or figure out which
> processes we are interested in monitoring. I need to do more research
> here but I'm considering either looking at the elements listed by
> diskimage-builder or by looking at the orc post-configure.d scripts to
> find service that are restarted.
>
There are basically two things you need to look for: things listening,
and things connected to rabbitmq/qpid.
So one crazy way to find things to monitor is to look at netstat or ss
and just monitor processes doing one of those things. I believe
assimilation monitoring's nanoprobe daemon already has the listening
part done:
http://techthoughts.typepad.com/managing_computers/2012/10/zero-configuration-discovery-and-server-monitoring-in-the-assimilation-monitoring-project.html
Also you may want to do two orc scripts in post-configure.d:
00-disruption-coming-stop-process-monitor
99-all-clear-start-process-monitor
Anyway, as Robert says, just keep it modular so that orgs that already
have a rich set of tools for this will be able to replace it.
More information about the OpenStack-dev
mailing list