Hi Sean,
Thanks for your post.
On 11/23/20 2:32 PM, Sean Mooney wrote:
nova need to enforce it as we use the absense or present of the db creads to know if common code is running in the compute agent or in controller services
I'm a bit shocked to read what I've read in this thread, about the double-guess that Nova is doing.
The way the Nova team has been describing, this really looks like hacks to hide the internals of Nova, and what the team is asking, is more band-aid for it. Probably things have been done this way to make it easier for the users, but at this point, it feels like we shouldn't attempt to hide facts anymore, and try to have everything explicit, which is going the opposite way of what you describe.
Why don't we go the other way around, and get things like a superconductor=true configuration directive, for example?
On Mon, 2020-11-23 at 18:34 +0100, Thomas Goirand wrote: this would break upgrades for everyone one of our upgrade constratins is you should never have to update your config as part of an upgrade. where config changes are needed we always give you at least 1 cycle to update your configs as part of a seperate maintance window. it would also force all deployment tools and packages to chagne as we would have new portentially incompatible options in config files. spcificlly packagers would have to ship different nova config file for supper conductors and cell conductors for that usecase.
On 11/23/20 2:32 PM, Sean Mooney wrote:
it is a bug to have the db cred in the set fo configs passed to nova-comptue and it has been for years.
In such case, detect that it's the nova-compute agent that's running, detect that it has access to db creds, and either: - display a big warning (I would prefer that) - display an error and quit (maybe bad in the case of an all-in-one setup)
This is orthogonal to the fact that Nova code is doing a hack (which should be fixed) to check which daemon is currently running in what mode.
it is not a hack, checking the binary name would be a hack as that is common code that shoudl be independnt of what name you give the file on disk. this was the where it would have failed from rocky on https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c8996... if you used upgrade level auto but set the api db creds in teh config of the compute agent https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c8996... if the api db options are set we get the min servie verion across all cells via a direct api call service_version = service_obj.get_minimum_version_all_cells( context.get_admin_context(), ['nova-compute']) and if they are not we do an indirect cal via the cell conductor to get the min compute service version in the local cell not that it is a remoteable method https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c8996... so when invoked form the compute agent it is converted to an rpc to the conductor to execute the lookup. get_minimum_version_all_cells is not remoteable since calling it form a compute node would result in an upcall to iterate over all the top level cells https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c8996... it was part of moving to cell v2 https://github.com/openstack/nova/commit/d779b33d72d6ef41651ce7b93fe982f121b... resolving https://bugs.launchpad.net/nova/+bug/1807044 the reason for the new failure is that we now uncontionally check the min compute node version using the same logic that we used before. https://github.com/openstack/nova/blob/dc93e3b510f53d5b2198c8edd22528f0c8996... if CONF.api_database.connection is not None: scope = 'system' current_service_version = service.get_minimum_version_all_cells( ctxt, ['nova-compute']) else: scope = 'cell' # We in a cell so target our query to the current cell only current_service_version = service.Service.get_minimum_version( ctxt, 'nova-compute') again using the presne of the db configurtion as a sentinal for is this code allowed to query the db directly. the patch that is under review to convert this to a warning just catches the excption for teh blocked direct db acess and logs the warning then execute the else branch https://review.opendev.org/c/openstack/nova/+/762175/1/nova/utils.py#1073 it is not a hack to rely on something that we have depened on for 2 years and we have stated for many more namely that the compute agent may not have the db options set.
On 11/23/20 2:32 PM, Sean Mooney wrote:
we could make this just a ERROR log without the hard fail but that would still not change the fact there is a bug in packages or deployment tools that should be fixed.
Probably. But that shouldn't be upstream author's business on how things are deployed. IMO, in the case of an all-in-one, nova-compute should continue to work and just ignore the db params, and at worse display a huge warning on the logs.
well there i dissagree the warning/error message is the bare minium we should do and at most we should allow ti to contiue executing and optimally it should stop the service sicne we know the config is invalid.
With the light of this thread, my opinion now has shifted to *not* have special files for the db credential, to give Nova a chance to tell the users what to do if nova-compute detects a mistake.
If we push the creds in /etc/nova/nova-db.conf, it wont be loaded by nova-compute, and it wont be able to warn the user that the file shouldn't be there on a compute node.
that is true we cant warn for a bug in the deployment too or in your packages but that is not something nova should have to do. we can warn when you invoke us with incorreect import but we should not have to check for other mistakes you have made.
Checking for the file existence only would be wrong (because it could have empty values and just be there ... just because it's there! :) ).
Hoping sharing my view is constructive and adding value to the thread, Cheers,
it is and just bacuase i dont agree with your view point does not mean i dont want to hear you express it as it chanlagnes me and other to constantly revialuate our postion based on the different persepctive shared :)
Thomas Goirand (zigo)