[Openstack-operators] Healthcheck URLs for services

Andy Botting andy at andybotting.com
Thu Apr 28 03:13:10 UTC 2016


We're running our services clustered behind an F5 loadbalancer in
production, and haproxy in our testing environment. This setup works quite
well for us, but I'm not that happy with testing the health of our
endpoints.

We're currently calling basic URLs like / or /v2 etc and some services
return a 200, some return other codes like 401. Our healthcheck test simply
checks against whatever the http code returns. This works OK and does catch
basic service failure.

Our test environment is on flaky hardware and often fails in strange ways
and sometimes the port is open and basic URLs work, but actually doing real
API calls fail and timeout, so our checks fall down here.

In a previous role I had, the developers added a url (e.g. /healthcheck) to
each web application which went through and tested things like the db
connection was OK, memcached was accessible, etc and returned a 200. This
worked out really great for operations. I haven't seen anything like this
for OpenStack.

I'm wondering how everyone else does healthchecking of their clustered
services, and whether or not they think adding a dedicated heathcheck URL
would be beneficial?

We do use scripts similar to ones in the osops-tools-monitoring in Nagios
which help with more complex testing, but I'm thinking of something more
lightweight specifically for setting up on loadbalancers.

cheers,
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160428/a596c906/attachment.html>


More information about the OpenStack-operators mailing list