[Openstack-operators] Should Healthcheck URLs for services be enabled by default?
harlowja at fastmail.com
Tue May 24 06:23:14 UTC 2016
That's a good question and I'm not really sure the historical reasons as
to why they are not, maybe someone with more historical wisdom will
I know that I put up https://review.openstack.org/#/c/12759/ many years
ago (commentary there may be useful in historical investigation)...
Andy Botting wrote:
> Thanks to Simon, Josh and Kris who replied to my last email about the
> healthcheck middlewear - these are now working well for us.
> I'm sure there are plenty of operators, like us, who didn't know this
> Is there any reason why they're not enabled by default?
> On 30 April 2016 at 11:52, Joshua Harlow <harlowja at fastmail.com
> <mailto:harlowja at fastmail.com>> wrote:
> This can help u more easily view what the healthcheck middleware can
> also show (especially in detailed mode); it can show thread stacks
> and such which can be useful for debugging stuck servers and such
> (similar in concept to apache mod_status).
> Run the above review like:
> $ python oslo_middleware/healthcheck/ -p 8000
> Then open a browser to http://127.0.0.1:8000/ (or other port).
> Joshua Harlow wrote:
> Yup, that healthcheck middleware was made more advanced by me,
> If u need to do anything special with it, let me know and I can help
> make that possible (or at least instruct what might need changed
> to do
> Simon Pasquier wrote:
> On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting
> <andy at andybotting.com <mailto:andy at andybotting.com>
> <mailto:andy at andybotting.com <mailto:andy at andybotting.com>>>
> We're running our services clustered behind an F5
> loadbalancer in
> production, and haproxy in our testing environment. This
> setup works
> quite well for us, but I'm not that happy with testing the
> health of
> our endpoints.
> We're currently calling basic URLs like / or /v2 etc and some
> services return a 200, some return other codes like 401. Our
> healthcheck test simply checks against whatever the http code
> returns. This works OK and does catch basic service failure.
> Our test environment is on flaky hardware and often fails in
> ways and sometimes the port is open and basic URLs work, but
> actually doing real API calls fail and timeout, so our
> checks fall
> down here.
> In a previous role I had, the developers added a url (e.g.
> /healthcheck) to each web application which went through and
> things like the db connection was OK, memcached was
> accessible, etc
> and returned a 200. This worked out really great for
> operations. I
> haven't seen anything like this for OpenStack.
> There's a healthcheck oslo.middleware plugin  available.
> So you could
> possibly configure the service pipeline to include this
> except it won't
> exercise the db connection, RabbitMQ connection, and so on.
> But it would
> help if you want to kick out a service instance from the
> without stopping the service completely .
> I'm wondering how everyone else does healthchecking of their
> clustered services, and whether or not they think adding a
> heathcheck URL would be beneficial?
> From what I can tell, people are doing the same thing as
> you do: check
> that a well-known location ('/', '/v2' or else) returns the
> code and hope that it will work for real user requests too.
> We do use scripts similar to ones in the
> osops-tools-monitoring in
> Nagios which help with more complex testing, but I'm thinking of
> something more lightweight specifically for setting up on
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
More information about the OpenStack-operators