<div dir="ltr">Thanks to Simon, Josh and Kris who replied to my last email about the healthcheck middlewear - these are now working well for us.<div><br></div><div>I'm sure there are plenty of operators, like us, who didn't know this existed.</div><div><br></div><div>Is there any reason why they're not enabled by default?</div><div><br></div><div>cheers,</div><div>Andy</div><div><div class="gmail_extra"><br><div class="gmail_quote">On 30 April 2016 at 11:52, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This can help u more easily view what the healthcheck middleware can also show (especially in detailed mode); it can show thread stacks and such which can be useful for debugging stuck servers and such (similar in concept to apache mod_status).<br>

<br>

<a href="https://review.openstack.org/#/c/311482/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/311482/</a><br>

<br>

Run the above review like:<br>

<br>

$ python oslo_middleware/healthcheck/ -p 8000<br>

<br>

Then open a browser to <a href="http://127.0.0.1:8000/" rel="noreferrer" target="_blank">http://127.0.0.1:8000/</a> (or other port).<br>

<br>

-Josh<div class="HOEnZb"><div class="h5"><br>

<br>

Joshua Harlow wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Yup, that healthcheck middleware was made more advanced by me,<br>

<br>

If u need to do anything special with it, let me know and I can help<br>

make that possible (or at least instruct what might need changed to do<br>

that).<br>

<br>

Simon Pasquier wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting <<a href="mailto:andy@andybotting.com" target="_blank">andy@andybotting.com</a><br>

<mailto:<a href="mailto:andy@andybotting.com" target="_blank">andy@andybotting.com</a>>> wrote:<br>

<br>

We're running our services clustered behind an F5 loadbalancer in<br>

production, and haproxy in our testing environment. This setup works<br>

quite well for us, but I'm not that happy with testing the health of<br>

our endpoints.<br>

<br>

We're currently calling basic URLs like / or /v2 etc and some<br>

services return a 200, some return other codes like 401. Our<br>

healthcheck test simply checks against whatever the http code<br>

returns. This works OK and does catch basic service failure.<br>

<br>

Our test environment is on flaky hardware and often fails in strange<br>

ways and sometimes the port is open and basic URLs work, but<br>

actually doing real API calls fail and timeout, so our checks fall<br>

down here.<br>

<br>

In a previous role I had, the developers added a url (e.g.<br>

/healthcheck) to each web application which went through and tested<br>

things like the db connection was OK, memcached was accessible, etc<br>

and returned a 200. This worked out really great for operations. I<br>

haven't seen anything like this for OpenStack.<br>

<br>

<br>

There's a healthcheck oslo.middleware plugin [1] available. So you could<br>

possibly configure the service pipeline to include this except it won't<br>

exercise the db connection, RabbitMQ connection, and so on. But it would<br>

help if you want to kick out a service instance from the load-balancer<br>

without stopping the service completely [2].<br>

<br>

[1]<br>

<a href="http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html</a><br>

<br>

[2]<br>

<a href="http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file</a><br>

<br>

<br>

I'm wondering how everyone else does healthchecking of their<br>

clustered services, and whether or not they think adding a dedicated<br>

heathcheck URL would be beneficial?<br>

<br>

<br>

>From what I can tell, people are doing the same thing as you do: check<br>

that a well-known location ('/', '/v2' or else) returns the expected<br>

code and hope that it will work for real user requests too.<br>

<br>

Simon<br>

<br>

<br>

We do use scripts similar to ones in the osops-tools-monitoring in<br>

Nagios which help with more complex testing, but I'm thinking of<br>

something more lightweight specifically for setting up on loadbalancers.<br>

<br>

cheers,<br>

Andy<br></blockquote></blockquote></div></div></blockquote></div></div></div></div>