[self-healing-sig] best practices for haproxy health checking

Ben Nemec openstack at nemebean.com
Fri Jan 11 17:31:34 UTC 2019



On 1/11/19 11:11 AM, Dirk Müller wrote:
> Hi,
> 
> Does anyone have a good pointer for good healthchecks to be used by
> the frontend api haproxy loadbalancer?
> 
> in one case that I am looking at right now, the entry haproxy
> loadbalancer was not able
> to detect a particular backend being not responding to api requests,
> so it flipped up and down repeatedly, causing intermittend spurious
> 503 errors.
> 
> The backend was able to respond to connections and to basic HTTP GET
> requests (e.g. / or even /v3 as path), but when it got a "real" query
> it hung. the reason for that was, as it turned out,
> the configured caching backend memcached on that machine being locked
> up (due to some other bug).
> 
> I wonder if there is a better way to check if a backend is "working"
> and what the best practices around this are. A potential thought I had
> was to do the backend check via some other healthcheck specific port
> that runs a custom daemon that does more sophisticated checks like
> checking for system wide errors (like memcache, database, rabbitmq)
> being unavailable on that node, and hence not accepting any api
> traffic until that is being resolved.

A very similar thing has been proposed: 
https://review.openstack.org/#/c/531456/

It also came up as a possible community goal for Train: 
http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000558.html

But to my knowledge no one has stepped forward to drive the work. It 
seems to be something people generally agree we need, but nobody has 
time to do. :-(

> 
> Any pointers to read upon / best practices appreciated.
> 
> Thanks,
> Dirk
> 



More information about the openstack-discuss mailing list