[tc][all] Wallaby Cycle Community Goals

Thomas Goirand zigo at debian.org
Thu Oct 1 07:03:25 UTC 2020


On 10/1/20 12:13 AM, Sean Mooney wrote:
> cant you just hit the root of the service? that is unauthenticated for microversion 
> version discovery so haproxy could simple use / for a http check if its just bing used
> to test if the rest api is running.

How will I make the difference, in my logs, between a client hitting /
for microversion discovery, and haproxy doing an healthcheck?

>> I believe "version": "2.79" is the microversion of the Nova API, which
>> therefore, exposes what version of Nova (here: Train). Am I correct?
> no you are not.
> it does not expose the package infomation it tells you the make microversion the api support but
> that is a different thing. we dont always bump the microversion in a release.
> ussuri and victoria but share the same microversion
> https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#maximum-in-ussuri-and-victoria
> 
> the microverion also wont change on a stable branch no matter what bugs exist or have been patched.

Right, but this still tells what version of OpenStack is installed.
Maybe Nova hasn't bumped it, but one could check multiple services, and
find out what version of OpenStack is there. This is still a problem, at
least more than what you've described with the /healthcheck URL.

>> believe we also must leave this, because clients must be able to
>> discover the micro-version of the API, right?
> yes without this no client can determin what api version is supported by a specific cloud.
> this is intened to be a public endpoint with no auth for that reason.

That part I don't understand. Couldn't this be an authenticated thing?

> if the generic oslo ping rpc was added we coudl use that but i think dansmith had a simpler proposal for caching it
> based on if we were able to connect during normal operation and jsut have the api check look at teh in memeory value.
> i.e. if the last attempt to read form the db failed to connect we would set a global variable  e.g. DB_ACCESSIBLE=FALSE
> and then the next time it succeded we set it to True. the health check woudl just read the global so there should be
> little to no overhead vs what oslo does
> 
> this would basically cache the last knon state and the health check is just doing the equivalent of 
> return DB_ACCESSIBLE and RPC_ACCESSIBLE

That's a good idea, but my patch has been sitting as rejected for the
last 5 months. It could be in Victoria already...

>> What is *not* useful as well, is delaying such a trivial patch for more
>> than 6 months, just in the hope that in a distant future, we may have
>> something better.
> 
> but as you yourself pointed out almost every service has a / enpoint that is used for microverion discovery that is
> public so not implementing /healtcheck in nova does not block you using / as the healthcheck url and you can enable the
> oslo endpoint if you chose too by enable the middle ware in your local deployment.

As I wrote above: that's *not* a good idea.

>> Sure, take your time, get something implemented that does a nice
>> healtcheck with db access and rabbitmq connectivity checks. But that
>> should in no way get in the path of having a configuration which works
>> for everyone by default.
> there is nothing stopping install tools providing that experience by default today.

Indeed, and I'm doing it. However, it's very annoying to rebase such a
patch on every single release of OpenStack, just like with every other
patch that the Debian packages are carying.

> at least as long as nova support configurable middleware they can enable or even enable the /healthcheck endpoint
> by default without requiring nova code change. i have looked at enough customer bug to know that network partions
> are common in real envionments where someone trying to use the /healthcheck endpoint to know if nova is healty would
> be severly disapointed when it says its healty and they cant boot any vms because rabbitmq is not reachable.

We discussed this already, and we all agree that the name of the URL was
a bad choice. It could have been called /reverse-proxycheck instead. But
that's too late and changing that name would break users, unfortunately.
If we want to rename it, then we must follow a cycle of deprecation.

However, I don't really care if some are disappointed because they can't
read the doc or the code, and just miss-guess because of the URL name.
We need this thing for HA setup, it is easy to get installed, so why not
do it? The URL name mistake cannot be a blocker.

> usecause outside fo haproxy failover a bad health check is arguable worse then no healthcheck.

I never wrote that I don't want a better health check. Just that the one
we have is already useful, and that it should be on by default.

>  im not unsympathic to your request but with what oslo does by default we would basically have to document that this
> should not be used to monitor the healthy of the nova service

What it does is already documented. If you want to add more in the
documentation, please do so.

> we have already got several bug reports to the status of vm not matching reality when connectivity to
> the cell is down. e.g. when we cant connect to the cell database if the vm is stoped say via a power off vis ssh then
> its state will not be reflected in a nova show.

This has nothing to do with what we're currently discussing, which is
having an URL to wire-in haproxy checks.

> if we were willing to add a big warning and clearly call out that this is just saying the api is accesable but not
> necessarily functional then i would be more ok with what olso provides but it does not tell you anything about the
> health of nova or if any other api request will actually work.

https://review.opendev.org/755433

> i would suggest adding this to the nova ptg etherpad if you want to move this forward in nova in particular.

I would suggest not discussing the mater too much, and actually doing
something about it. :)

It has been discussed already for a way too long.

Cheers,

Thomas Goirand (zigo)



More information about the openstack-discuss mailing list