[Openstack-operators] Healthcheck URLs for services

Joshua Harlow harlowja at fastmail.com
Sat Apr 30 01:52:06 UTC 2016


This can help u more easily view what the healthcheck middleware can 
also show (especially in detailed mode); it can show thread stacks and 
such which can be useful for debugging stuck servers and such (similar 
in concept to apache mod_status).

https://review.openstack.org/#/c/311482/

Run the above review like:

$ python oslo_middleware/healthcheck/ -p 8000

Then open a browser to http://127.0.0.1:8000/ (or other port).

-Josh

Joshua Harlow wrote:
> Yup, that healthcheck middleware was made more advanced by me,
>
> If u need to do anything special with it, let me know and I can help
> make that possible (or at least instruct what might need changed to do
> that).
>
> Simon Pasquier wrote:
>> Hi,
>>
>> On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting <andy at andybotting.com
>> <mailto:andy at andybotting.com>> wrote:
>>
>> We're running our services clustered behind an F5 loadbalancer in
>> production, and haproxy in our testing environment. This setup works
>> quite well for us, but I'm not that happy with testing the health of
>> our endpoints.
>>
>> We're currently calling basic URLs like / or /v2 etc and some
>> services return a 200, some return other codes like 401. Our
>> healthcheck test simply checks against whatever the http code
>> returns. This works OK and does catch basic service failure.
>>
>> Our test environment is on flaky hardware and often fails in strange
>> ways and sometimes the port is open and basic URLs work, but
>> actually doing real API calls fail and timeout, so our checks fall
>> down here.
>>
>> In a previous role I had, the developers added a url (e.g.
>> /healthcheck) to each web application which went through and tested
>> things like the db connection was OK, memcached was accessible, etc
>> and returned a 200. This worked out really great for operations. I
>> haven't seen anything like this for OpenStack.
>>
>>
>> There's a healthcheck oslo.middleware plugin [1] available. So you could
>> possibly configure the service pipeline to include this except it won't
>> exercise the db connection, RabbitMQ connection, and so on. But it would
>> help if you want to kick out a service instance from the load-balancer
>> without stopping the service completely [2].
>>
>> [1]
>> http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html
>>
>> [2]
>> http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file
>>
>>
>> I'm wondering how everyone else does healthchecking of their
>> clustered services, and whether or not they think adding a dedicated
>> heathcheck URL would be beneficial?
>>
>>
>> From what I can tell, people are doing the same thing as you do: check
>> that a well-known location ('/', '/v2' or else) returns the expected
>> code and hope that it will work for real user requests too.
>>
>> Simon
>>
>>
>> We do use scripts similar to ones in the osops-tools-monitoring in
>> Nagios which help with more complex testing, but I'm thinking of
>> something more lightweight specifically for setting up on loadbalancers.
>>
>> cheers,
>> Andy
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> <mailto:OpenStack-operators at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



More information about the OpenStack-operators mailing list