[openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

Simon Pasquier spasquier at mirantis.com
Wed May 13 13:51:56 UTC 2015


On Wed, May 13, 2015 at 3:27 PM, David Kranz <dkranz at redhat.com> wrote:

>  On 05/13/2015 09:06 AM, Simon Pasquier wrote:
>
>   Hello,
>
> Like many others commented before, I don't quite understand how unique are
> the Cloudpulse use cases.
>
> For operators, I got the feeling that existing solutions fit well:
> - Traditional monitoring tools (Nagios, Zabbix, ....) are necessary anyway
> for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ,
> databases and more) and diagnostic purposes. Adding OpenStack service
> checks is fairly easy if you already have the toolchain.
>
> Is it really so easy? Rabbitmq has an "aliveness" test that is easy to
> hook into. I don't know exactly what it does, other than what the doc says,
> but I should not have to. If I want my standard monitoring system to call
> into a cloud and ask "is nova healthy?", "is glance healthy?", etc. are
> their such calls?
>

Regarding RabbitMQ aliveness test, it has its own limits (more on that
latter, I've got an "interesting" RabbitMQ outage that I'm going to discuss
in a new thread) and it doesn't replicate exactly what the clients (eg
OpenStack services) are doing.

Regarding the service checks, there are already plenty of scripts that
exist for Nagios, Collectd and so on. Some of them are listed in the Wiki
[1].


> There are various sets of calls associated with nagios, zabbix, etc. but
> those seem like "after-market" parts for a car. Seems to me the services
> themselves would know best how to check if they are healthy, particularly
> as that could change version to version. Has their been discussion of
> adding a health-check (admin) api in each service? Lacking that, is there
> documentation from any OpenStack projects about "how to check the health of
> nova"? When I saw this thread start, that is what I thought it was going to
> be about.
>

Starting with Kilo, you could configure your OpenStack API services with
the healthcheck middleware [2]. This has been inspired by what Swift's been
doing for some time now [3].IIUC the default healthcheck is minimalist and
doesn't check that dependent services (like RabbitMQ, database) are healthy
but the framework is extensible and more healthchecks can be added.


>
>  -David
>
>
BR,
Simon

[1] https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending
[2]
http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck
[3]
http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html


>
>    - OpenStack projects like Rally or Tempest can generate synthetic
> loads and run end-to-end tests. Integrating them with a monitoring system
> isn't terribly difficult either.
>
> As far as Monitoring-as-a-service is concerned, do you have plans to
> integrate/leverage Ceilometer?
>
>  BR,
>  Simon
>
> On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) <
> vpandari at cisco.com> wrote:
>
>>   Hello,
>>
>>    I'm pleased to announce the development of a new project called
>> CloudPulse.  CloudPulse provides Openstack
>>  health-checking services to both operators, tenants, and applications.
>> This project will begin as
>>  a StackForge project based upon an empty cookiecutter[1] repo.  The
>> repos to work in are:
>>  Server:   https://github.com/stackforge/cloudpulse
>>  Client:     https://github.com/stackforge/python-cloudpulseclient
>>
>>  Please join us via iRC on #openstack-cloudpulse on freenode.
>>
>>  I am holding a doodle poll to select times for our first meeting the
>> week after summit.  This doodle poll will close May 24th and meeting times
>> will be announced on the mailing list at that time.  At our first IRC
>> meeting,
>>  we will draft additional core team members, so if your interested in
>> joining a fresh new development effort, please attend our first meeting.
>>  Please take a moment if your interested in CloudPulse to fill out the
>> doodle poll here:
>>
>>  https://doodle.com/kcpvzy8kfrxe6rvb
>>
>>  The initial core team is composed of
>>  Ajay Kalambur,
>>  Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod
>> Pandarinathan.
>>  I expect more members to join during our initial meeting.
>>
>>   A little bit about CloudPulse:
>>   Cloud operators need notification of OpenStack failures before a
>> customer reports the failure. Cloud operators can then take timely
>> corrective actions with minimal disruption to applications.  Many cloud
>> applications, including
>>  those I am interested in (NFV) have very stringent service level
>> agreements.  Loss of service can trigger contractual
>>  costs associated with the service.  Application high availability
>> requires an operational OpenStack Cloud, and the reality
>>  is that occascionally OpenStack clouds fail in some mysterious ways.
>> This project intends to identify when those failures
>>  occur so corrective actions may be taken by operators, tenants, and the
>> applications themselves.
>>
>>  OpenStack is considered healthy when OpenStack API services respond
>> appropriately.  Further OpenStack is
>>  healthy when network traffic can be sent between the tenant networks
>> and can access the Internet.  Finally OpenStack
>>  is healthy when all infrastructure cluster elements are in an
>> operational state.
>>
>>  For information about blueprints check out:
>>   https://blueprints.launchpad.net/cloudpulse
>>  https://blueprints.launchpad.net/python-cloudpulseclient
>>
>>  For more details, check out our Wiki:
>>  https://wiki.openstack.org/wiki/Cloudpulse
>>
>>  Plase join the CloudPulse team in designing and implementing a
>> world-class Carrier Grade system for checking
>>  the health of OpenStack clouds.  We look forward to seeing you on IRC
>> on #openstack-cloudpulse.
>>
>>  Regards,
>>  Vinod Pandarinathan
>>  [1] https://github.com/openstack-dev/cookiecutter
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150513/51cc154a/attachment.html>


More information about the OpenStack-dev mailing list