<div dir="ltr">Thanks to Simon, Josh and Kris who replied to my last email about the healthcheck middlewear - these are now working well for us.<div><br></div><div>I'm sure there are plenty of operators, like us, who didn't know this existed.</div><div><br></div><div>Is there any reason why they're not enabled by default?</div><div><br></div><div>cheers,</div><div>Andy</div><div><div class="gmail_extra"><br><div class="gmail_quote">On 30 April 2016 at 11:52, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This can help u more easily view what the healthcheck middleware can also show (especially in detailed mode); it can show thread stacks and such which can be useful for debugging stuck servers and such (similar in concept to apache mod_status).<br>
<br>
<a href="https://review.openstack.org/#/c/311482/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/311482/</a><br>
<br>
Run the above review like:<br>
<br>
$ python oslo_middleware/healthcheck/ -p 8000<br>
<br>
Then open a browser to <a href="http://127.0.0.1:8000/" rel="noreferrer" target="_blank">http://127.0.0.1:8000/</a> (or other port).<br>
<br>
-Josh<div class="HOEnZb"><div class="h5"><br>
<br>
Joshua Harlow wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Yup, that healthcheck middleware was made more advanced by me,<br>
<br>
If u need to do anything special with it, let me know and I can help<br>
make that possible (or at least instruct what might need changed to do<br>
that).<br>
<br>
Simon Pasquier wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting <<a href="mailto:andy@andybotting.com" target="_blank">andy@andybotting.com</a><br>
<mailto:<a href="mailto:andy@andybotting.com" target="_blank">andy@andybotting.com</a>>> wrote:<br>
<br>
We're running our services clustered behind an F5 loadbalancer in<br>
production, and haproxy in our testing environment. This setup works<br>
quite well for us, but I'm not that happy with testing the health of<br>
our endpoints.<br>
<br>
We're currently calling basic URLs like / or /v2 etc and some<br>
services return a 200, some return other codes like 401. Our<br>
healthcheck test simply checks against whatever the http code<br>
returns. This works OK and does catch basic service failure.<br>
<br>
Our test environment is on flaky hardware and often fails in strange<br>
ways and sometimes the port is open and basic URLs work, but<br>
actually doing real API calls fail and timeout, so our checks fall<br>
down here.<br>
<br>
In a previous role I had, the developers added a url (e.g.<br>
/healthcheck) to each web application which went through and tested<br>
things like the db connection was OK, memcached was accessible, etc<br>
and returned a 200. This worked out really great for operations. I<br>
haven't seen anything like this for OpenStack.<br>
<br>
<br>
There's a healthcheck oslo.middleware plugin [1] available. So you could<br>
possibly configure the service pipeline to include this except it won't<br>
exercise the db connection, RabbitMQ connection, and so on. But it would<br>
help if you want to kick out a service instance from the load-balancer<br>
without stopping the service completely [2].<br>
<br>
[1]<br>
<a href="http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html</a><br>
<br>
[2]<br>
<a href="http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file</a><br>
<br>
<br>
I'm wondering how everyone else does healthchecking of their<br>
clustered services, and whether or not they think adding a dedicated<br>
heathcheck URL would be beneficial?<br>
<br>
<br>
>From what I can tell, people are doing the same thing as you do: check<br>
that a well-known location ('/', '/v2' or else) returns the expected<br>
code and hope that it will work for real user requests too.<br>
<br>
Simon<br>
<br>
<br>
We do use scripts similar to ones in the osops-tools-monitoring in<br>
Nagios which help with more complex testing, but I'm thinking of<br>
something more lightweight specifically for setting up on loadbalancers.<br>
<br>
cheers,<br>
Andy<br></blockquote></blockquote></div></div></blockquote></div></div></div></div>