[nova][all] Adding /healthcheck support in Nova, and better healthcheck in every projects

Tobias Urdin tobias.urdin at binero.com
Fri Nov 19 15:35:55 UTC 2021


As Mohammed said, you can actually do the exact same in haproxy by setting the server
in the backend to drain which would be the same just the opposite way around.

That is “set server <backend>/<server> state drain” over haproxy admin socket.

I really welcome Sean’s proposal on a real healthcheck framework that would
actually tell you that something is not working instead of trying to find for example
RabbitMQ connection issues from logs, it really is a pain.

I wouldn’t want to have an “real” healthcheck that does all these things exposed
on public API though and think Sean’s proposal is correct and does not break
backward capability since oslo.healthcheck middleware will still be there.

Best regards
Tobias


> On 18 Nov 2021, at 16:50, Thomas Goirand <zigo at debian.org> wrote:
> 
> On 11/18/21 2:03 AM, Mohammed Naser wrote:
>> 
>> 
>> On Wed, Nov 17, 2021 at 5:52 PM Thomas Goirand <zigo at debian.org
>> <mailto:zigo at debian.org>> wrote:
>> 
>>    On 11/17/21 10:54 PM, Dan Smith wrote:
>>>> I don't think we rely on /healthcheck -- there's nothing healthy
>>    about
>>>> an API endpoint blindly returning a 200 OK.
>>>> 
>>>> You might as well just hit / and accept 300 as a code and that's
>>>> exactly the same behaviour.  I support what Sean is bringing up here
>>>> and I don't think it makes sense to have a noop /healthcheck that
>>>> always gives a 200 OK...seems a bit useless imho
>>> 
>>> Yup, totally agree. Our previous concerns over a healthcheck that
>>> checked all of nova returning too much info to be useful (for
>>    something
>>> trying to figure out if an individual worker is healthy) apply in
>>> reverse to one that returns too little to be useful.
>>> 
>>> I agree, what Sean is working on is the right balance and that we
>>    should
>>> focus on that.
>>> 
>>> --Dan
>>> 
>> 
>>    That's not the only thing it does. It also is capable of being disabled,
>>    which is useful for maintenance: one can gracefully remove an API node
>>    for removal this way, which one cannot do with the root.
>> 
>> 
>> I feel like this should be handled by whatever layer that needs to drain
>> requests for maintenance, otherwise also it might just be the same as
>> turning off the service, no?
> 
> It's not the same.
> 
> If you just turn off the service, there well may be some requests
> attempted to the API before it's seen as down. The idea here, is to
> declare the API as down, so that haproxy can remove it from the pool
> *before* the service is really turned off.
> 
> That's what the oslo.middleware disable file helps doing, which the root
> url cannot do.
> 
> Cheers,
> 
> Thomas Goirand (zigo)
> 



More information about the openstack-discuss mailing list