[self-healing-sig] best practices for haproxy health checking
openstack at nemebean.com
Fri Jan 11 17:31:34 UTC 2019
On 1/11/19 11:11 AM, Dirk Müller wrote:
> Does anyone have a good pointer for good healthchecks to be used by
> the frontend api haproxy loadbalancer?
> in one case that I am looking at right now, the entry haproxy
> loadbalancer was not able
> to detect a particular backend being not responding to api requests,
> so it flipped up and down repeatedly, causing intermittend spurious
> 503 errors.
> The backend was able to respond to connections and to basic HTTP GET
> requests (e.g. / or even /v3 as path), but when it got a "real" query
> it hung. the reason for that was, as it turned out,
> the configured caching backend memcached on that machine being locked
> up (due to some other bug).
> I wonder if there is a better way to check if a backend is "working"
> and what the best practices around this are. A potential thought I had
> was to do the backend check via some other healthcheck specific port
> that runs a custom daemon that does more sophisticated checks like
> checking for system wide errors (like memcache, database, rabbitmq)
> being unavailable on that node, and hence not accepting any api
> traffic until that is being resolved.
A very similar thing has been proposed:
It also came up as a possible community goal for Train:
But to my knowledge no one has stepped forward to drive the work. It
seems to be something people generally agree we need, but nobody has
time to do. :-(
> Any pointers to read upon / best practices appreciated.
More information about the openstack-discuss