Hi Sean, Thanks for your reply and sharing your concerns. I still don't agree with you, and here's why. On 9/30/20 12:29 PM, Sean Mooney wrote:
I'm still available to attempt the /healthcheck thingy, I kind of succeed in all major project but ... nova. Unfortunately, it was decided in the project that we should put this on hold until the /healthcheck can implement more check than just to know if the API is alive. 5 months forward, I believe my original patch [1] should have been approved first as a first approach. Nova team: any reaction? i actually dont think the healthcheck enpoint is userful in it current from for any porject
On Thu, 2020-09-24 at 22:39 +0200, Thomas Goirand wrote: that has distibuted process like nova, neutron or cinder.
With my operator's hat on, I assure you that it is very useful to wire-up haproxy. This is btw exactly what it was designed for. The way haproxy works, is that it actually has to perform an http connection to check if the web service is alive. Without a specific URL, we can't filter-out that /healthcheck URL from the saved logs in Elastic search, which is very annoying. Not doing a real HTTP check means one falls back to a TCP check, which means that your logs are polluted with so many "client disconnected unexpectedly" (that's not the actual message, but that's what it is doing) since haproxy otherwise does a TCP connection, then closes it before what a normal HTTP query would do. I've been told to use other URLs, which isn't the way to go. I very much think the /healthcheck URL was well designed, and should be used.
that was not the only concern raised either as the default content of the detail responce wich include package infomation was considerd a possible security vulnerablity so with out agreeing on what kindo fo info can be retruned, its format and wether this would be a admin only endpoint or a public endpoint tenant can check potentially without auth i dont think we should be procedding with this as a goal.
Are you aware that one could simply use "/" of most projects without auth, and get the version of the project? Example with Nova: curl -s https://api.example.com/compute/ | jq . { "versions": [ { "id": "v2.0", "status": "SUPPORTED", "version": "", "min_version": "", "updated": "2011-01-21T11:33:21Z", "links": [ { "rel": "self", "href": "https://api.example.com/v2/" } ] }, { "id": "v2.1", "status": "CURRENT", "version": "2.79", "min_version": "2.1", "updated": "2013-07-23T11:33:21Z", "links": [ { "rel": "self", "href": "https://clint1-api.cloud.infomaniak.ch/v2.1/" } ] } ] } I believe "version": "2.79" is the microversion of the Nova API, which therefore, exposes what version of Nova (here: Train). Am I correct? I believe we also must leave this, because clients must be able to discover the micro-version of the API, right? Then, if I'm right, isn't this a much more annoying problem than having a /healtcheck URL which could anyway be filtered by an HAProxy in front? (note: I don't think the above is really a problem, since the micro-version doesn't tell if Nova has been patched for security or not...)
https://github.com/openstack/oslo.middleware/blob/master/oslo_middleware/hea... ^ is concerning from a security point of view.
This needs to be explicitly enabled in the api-paste.ini, and is otherwise not displayed. Here's what I get in my Train deployment with the query suggested in the example you gave: { "detailed": false, "reasons": [] } The code in here: https://github.com/openstack/oslo.middleware/blob/master/oslo_middleware/hea... shows I'm right. That's also well documented here: https://docs.openstack.org/oslo.middleware/latest/reference/healthcheck_plug... See where it says this: # set to True to enable detailed output, False is the default detailed = False in the api-paste.ini example. So here, the point you are making isn't IMO valid.
nova support configurable middelway still through the api-paste.ini file so untill we have a useful health check
As we discussed earlier, what is *not* useful, is any healthcheck that would do more than what oslo.middleware does right now without caching things, because that /healthcheck URL is typically called multiple times per seconds (in our deployment: at least 6 times per seconds), so it needs to reply fast. So I hope that the super-nice-healthcheck thingy wont fry my server CPUs ... :) What is *not* useful as well, is delaying such a trivial patch for more than 6 months, just in the hope that in a distant future, we may have something better. Sure, take your time, get something implemented that does a nice healtcheck with db access and rabbitmq connectivity checks. But that should in no way get in the path of having a configuration which works for everyone by default. Cheers, Thomas Goirand (zigo)