As Mohammed said, you can actually do the exact same in haproxy by setting the server in the backend to drain which would be the same just the opposite way around. That is “set server <backend>/<server> state drain” over haproxy admin socket. I really welcome Sean’s proposal on a real healthcheck framework that would actually tell you that something is not working instead of trying to find for example RabbitMQ connection issues from logs, it really is a pain. I wouldn’t want to have an “real” healthcheck that does all these things exposed on public API though and think Sean’s proposal is correct and does not break backward capability since oslo.healthcheck middleware will still be there. Best regards Tobias
On 18 Nov 2021, at 16:50, Thomas Goirand <zigo@debian.org> wrote:
On 11/18/21 2:03 AM, Mohammed Naser wrote:
On Wed, Nov 17, 2021 at 5:52 PM Thomas Goirand <zigo@debian.org <mailto:zigo@debian.org>> wrote:
On 11/17/21 10:54 PM, Dan Smith wrote:
I don't think we rely on /healthcheck -- there's nothing healthy about an API endpoint blindly returning a 200 OK.
You might as well just hit / and accept 300 as a code and that's exactly the same behaviour. I support what Sean is bringing up here and I don't think it makes sense to have a noop /healthcheck that always gives a 200 OK...seems a bit useless imho
Yup, totally agree. Our previous concerns over a healthcheck that checked all of nova returning too much info to be useful (for something trying to figure out if an individual worker is healthy) apply in reverse to one that returns too little to be useful.
I agree, what Sean is working on is the right balance and that we should focus on that.
--Dan
That's not the only thing it does. It also is capable of being disabled, which is useful for maintenance: one can gracefully remove an API node for removal this way, which one cannot do with the root.
I feel like this should be handled by whatever layer that needs to drain requests for maintenance, otherwise also it might just be the same as turning off the service, no?
It's not the same.
If you just turn off the service, there well may be some requests attempted to the API before it's seen as down. The idea here, is to declare the API as down, so that haproxy can remove it from the pool *before* the service is really turned off.
That's what the oslo.middleware disable file helps doing, which the root url cannot do.
Cheers,
Thomas Goirand (zigo)