Open Stack

Fri Jan 11 17:11:03 UTC 2019

Hi,

Does anyone have a good pointer for good healthchecks to be used by
the frontend api haproxy loadbalancer?

in one case that I am looking at right now, the entry haproxy
loadbalancer was not able
to detect a particular backend being not responding to api requests,
so it flipped up and down repeatedly, causing intermittend spurious
503 errors.

The backend was able to respond to connections and to basic HTTP GET
requests (e.g. / or even /v3 as path), but when it got a "real" query
it hung. the reason for that was, as it turned out,
the configured caching backend memcached on that machine being locked
up (due to some other bug).

I wonder if there is a better way to check if a backend is "working"
and what the best practices around this are. A potential thought I had
was to do the backend check via some other healthcheck specific port
that runs a custom daemon that does more sophisticated checks like
checking for system wide errors (like memcache, database, rabbitmq)
being unavailable on that node, and hence not accepting any api
traffic until that is being resolved.

Any pointers to read upon / best practices appreciated.

Thanks,
Dirk

Open Stack

[self-healing-sig] best practices for haproxy health checking

OpenStack

Community

Documentation

Branding & Legal