[openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
David Kranz
dkranz at redhat.com
Wed May 13 13:27:05 UTC 2015
On 05/13/2015 09:06 AM, Simon Pasquier wrote:
> Hello,
>
> Like many others commented before, I don't quite understand how unique
> are the Cloudpulse use cases.
>
> For operators, I got the feeling that existing solutions fit well:
> - Traditional monitoring tools (Nagios, Zabbix, ....) are necessary
> anyway for infrastructure monitoring (CPU, RAM, disks, operating
> system, RabbitMQ, databases and more) and diagnostic purposes. Adding
> OpenStack service checks is fairly easy if you already have the toolchain.
Is it really so easy? Rabbitmq has an "aliveness" test that is easy to
hook into. I don't know exactly what it does, other than what the doc
says, but I should not have to. If I want my standard monitoring system
to call into a cloud and ask "is nova healthy?", "is glance healthy?",
etc. are their such calls?
There are various sets of calls associated with nagios, zabbix, etc. but
those seem like "after-market" parts for a car. Seems to me the services
themselves would know best how to check if they are healthy,
particularly as that could change version to version. Has their been
discussion of adding a health-check (admin) api in each service? Lacking
that, is there documentation from any OpenStack projects about "how to
check the health of nova"? When I saw this thread start, that is what I
thought it was going to be about.
-David
> - OpenStack projects like Rally or Tempest can generate synthetic
> loads and run end-to-end tests. Integrating them with a monitoring
> system isn't terribly difficult either.
>
> As far as Monitoring-as-a-service is concerned, do you have plans to
> integrate/leverage Ceilometer?
>
> BR,
> Simon
>
> On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari)
> <vpandari at cisco.com <mailto:vpandari at cisco.com>> wrote:
>
> Hello,
>
> I'm pleased to announce the development of a new project called
> CloudPulse. CloudPulse provides Openstack
> health-checking services to both operators, tenants, and
> applications. This project will begin as
> a StackForge project based upon an empty cookiecutter[1] repo.
> The repos to work in are:
> Server: https://github.com/stackforge/cloudpulse
> Client: https://github.com/stackforge/python-cloudpulseclient
>
> Please join us via iRC on #openstack-cloudpulse on freenode.
>
> I am holding a doodle poll to select times for our first meeting
> the week after summit. This doodle poll will close May 24th and
> meeting times will be announced on the mailing list at that time.
> At our first IRC meeting,
> we will draft additional core team members, so if your interested
> in joining a fresh new development effort, please attend our first
> meeting.
> Please take a moment if your interested in CloudPulse to fill out
> the doodle poll here:
>
> https://doodle.com/kcpvzy8kfrxe6rvb
>
> The initial core team is composed of
> Ajay Kalambur,
> Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven
> DakeandVinod Pandarinathan.
> I expect more members to join during our initial meeting.
>
> A little bit about CloudPulse:
> Cloud operators need notification of OpenStack failures before a
> customer reports the failure. Cloud operators can then take timely
> corrective actions with minimal disruption to applications. Many
> cloud applications, including
> those I am interested in (NFV) have very stringent service level
> agreements. Loss of service can trigger contractual
> costs associated with the service. Application high availability
> requires an operational OpenStack Cloud, and the reality
> is that occascionally OpenStack clouds fail in some mysterious
> ways. This project intends to identify when those failures
> occur so corrective actions may be taken by operators, tenants,
> and the applications themselves.
>
> OpenStack is considered healthy when OpenStack API services
> respond appropriately. Further OpenStack is
> healthy when network traffic can be sent between the tenant
> networks and can access the Internet. Finally OpenStack
> is healthy when all infrastructure cluster elements are in an
> operational state.
>
> For information about blueprints check out:
> https://blueprints.launchpad.net/cloudpulse
> https://blueprints.launchpad.net/python-cloudpulseclient
>
> For more details, check out our Wiki:
> https://wiki.openstack.org/wiki/Cloudpulse
>
> Plase join the CloudPulse team in designing and implementing a
> world-class Carrier Grade system for checking
> the health of OpenStack clouds. We look forward to seeing you on
> IRC on #openstack-cloudpulse.
>
> Regards,
> Vinod Pandarinathan
> [1] https://github.com/openstack-dev/cookiecutter
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150513/9c1e617f/attachment.html>
More information about the OpenStack-dev
mailing list