Re: [nova][all] Adding /healthcheck support in Nova, and better healthcheck in every projects

19 Nov 2021

      As Mohammed said, you can actually do the exact same in haproxy by setting the server
in the backend to drain which would be the same just the opposite way around.

That is “set server <backend>/<server> state drain” over haproxy admin socket.

I really welcome Sean’s proposal on a real healthcheck framework that would
actually tell you that something is not working instead of trying to find for example
RabbitMQ connection issues from logs, it really is a pain.

I wouldn’t want to have an “real” healthcheck that does all these things exposed
on public API though and think Sean’s proposal is correct and does not break
backward capability since oslo.healthcheck middleware will still be there.

Best regards
Tobias
...
On 18 Nov 2021, at 16:50, Thomas Goirand <zigo@debian.org> wrote:
On 11/18/21 2:03 AM, Mohammed Naser wrote:
...
On Wed, Nov 17, 2021 at 5:52 PM Thomas Goirand <zigo@debian.org
<mailto:zigo@debian.org>> wrote:
On 11/17/21 10:54 PM, Dan Smith wrote:
...
...
I don't think we rely on /healthcheck -- there's nothing healthy
   about
an API endpoint blindly returning a 200 OK.
You might as well just hit / and accept 300 as a code and that's
exactly the same behaviour.  I support what Sean is bringing up here
and I don't think it makes sense to have a noop /healthcheck that
always gives a 200 OK...seems a bit useless imho
Yup, totally agree. Our previous concerns over a healthcheck that
checked all of nova returning too much info to be useful (for
   something
trying to figure out if an individual worker is healthy) apply in
reverse to one that returns too little to be useful.
I agree, what Sean is working on is the right balance and that we
   should
focus on that.
--Dan
That's not the only thing it does. It also is capable of being disabled,
   which is useful for maintenance: one can gracefully remove an API node
   for removal this way, which one cannot do with the root.
I feel like this should be handled by whatever layer that needs to drain
requests for maintenance, otherwise also it might just be the same as
turning off the service, no?
It's not the same.
If you just turn off the service, there well may be some requests
attempted to the API before it's seen as down. The idea here, is to
declare the API as down, so that haproxy can remove it from the pool
*before* the service is really turned off.
That's what the oslo.middleware disable file helps doing, which the root
url cannot do.
Cheers,
Thomas Goirand (zigo)

Re: [nova][all] Adding /healthcheck support in Nova, and better healthcheck in every projects

Tobias Urdin