[Openstack-operators] Should Healthcheck URLs for services be enabled by default?

Joshua Harlow harlowja at fastmail.com
Tue May 24 06:23:14 UTC 2016


That's a good question and I'm not really sure the historical reasons as 
to why they are not, maybe someone with more historical wisdom will 
chime in.

I know that I put up https://review.openstack.org/#/c/12759/ many years 
ago (commentary there may be useful in historical investigation)...

Andy Botting wrote:
> Thanks to Simon, Josh and Kris who replied to my last email about the
> healthcheck middlewear - these are now working well for us.
>
> I'm sure there are plenty of operators, like us, who didn't know this
> existed.
>
> Is there any reason why they're not enabled by default?
>
> cheers,
> Andy
>
> On 30 April 2016 at 11:52, Joshua Harlow <harlowja at fastmail.com
> <mailto:harlowja at fastmail.com>> wrote:
>
>     This can help u more easily view what the healthcheck middleware can
>     also show (especially in detailed mode); it can show thread stacks
>     and such which can be useful for debugging stuck servers and such
>     (similar in concept to apache mod_status).
>
>     https://review.openstack.org/#/c/311482/
>
>     Run the above review like:
>
>     $ python oslo_middleware/healthcheck/ -p 8000
>
>     Then open a browser to http://127.0.0.1:8000/ (or other port).
>
>     -Josh
>
>
>     Joshua Harlow wrote:
>
>         Yup, that healthcheck middleware was made more advanced by me,
>
>         If u need to do anything special with it, let me know and I can help
>         make that possible (or at least instruct what might need changed
>         to do
>         that).
>
>         Simon Pasquier wrote:
>
>             Hi,
>
>             On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting
>             <andy at andybotting.com <mailto:andy at andybotting.com>
>             <mailto:andy at andybotting.com <mailto:andy at andybotting.com>>>
>             wrote:
>
>             We're running our services clustered behind an F5
>             loadbalancer in
>             production, and haproxy in our testing environment. This
>             setup works
>             quite well for us, but I'm not that happy with testing the
>             health of
>             our endpoints.
>
>             We're currently calling basic URLs like / or /v2 etc and some
>             services return a 200, some return other codes like 401. Our
>             healthcheck test simply checks against whatever the http code
>             returns. This works OK and does catch basic service failure.
>
>             Our test environment is on flaky hardware and often fails in
>             strange
>             ways and sometimes the port is open and basic URLs work, but
>             actually doing real API calls fail and timeout, so our
>             checks fall
>             down here.
>
>             In a previous role I had, the developers added a url (e.g.
>             /healthcheck) to each web application which went through and
>             tested
>             things like the db connection was OK, memcached was
>             accessible, etc
>             and returned a 200. This worked out really great for
>             operations. I
>             haven't seen anything like this for OpenStack.
>
>
>             There's a healthcheck oslo.middleware plugin [1] available.
>             So you could
>             possibly configure the service pipeline to include this
>             except it won't
>             exercise the db connection, RabbitMQ connection, and so on.
>             But it would
>             help if you want to kick out a service instance from the
>             load-balancer
>             without stopping the service completely [2].
>
>             [1]
>             http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html
>
>             [2]
>             http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file
>
>
>             I'm wondering how everyone else does healthchecking of their
>             clustered services, and whether or not they think adding a
>             dedicated
>             heathcheck URL would be beneficial?
>
>
>              From what I can tell, people are doing the same thing as
>             you do: check
>             that a well-known location ('/', '/v2' or else) returns the
>             expected
>             code and hope that it will work for real user requests too.
>
>             Simon
>
>
>             We do use scripts similar to ones in the
>             osops-tools-monitoring in
>             Nagios which help with more complex testing, but I'm thinking of
>             something more lightweight specifically for setting up on
>             loadbalancers.
>
>             cheers,
>             Andy
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



More information about the OpenStack-operators mailing list