On Thu, Nov 19, 2020 at 13:07, Balázs Gibizer <balazs.gibizer@est.tech> wrote:
On Wed, Nov 18, 2020 at 11:24, Dan Smith <dms@danplanet.com> wrote:
That leads me to the first action item I think we need:
ACTION 1: Make the wsgi app able to use something other than nova.conf
All of our other services support a --config-file flag, and I don't see why we wouldn't allow this if it's necessary for deployments. In the pure API arrangement, it shouldn't really be necessary to change this, but clearly you may need to have multiple API workers with different configs, as is the case with metadata-per-cell, or even potentially some admin-focused private API endpoint. If it makes it easier for deployment tools to start the API workers with a specific config file, we should let them do that. You can already work around that hard-coded filename by setting OS_NOVA_CONFIG_DIR to something other than /etc/nova, but we shouldn't require them to create directories just to have separate configs.
I will look into this change and propose patch a patch.
The next item is related:
ACTION 2: We need to document a minimal viable config for each service
Ollie points out that, obviously, handing the deployer a config reference with 1.21 bajillion config options does not clearly indicate which services need which thing, and especially, which things are _not_allowed_ to be set for a service (such as db credentials on the compute). We can't feasibly audit every config option we have, but it would be very helpful to have a reference that shows what a completely minimal configuration for each service looks like. We could do that by tagging services per config options (which I suggested earlier, and would still be good) but I think maybe a standalone document would be easier for people to follow.
I've opened a doc bug to track the minimal config documentation work: https://bugs.launchpad.net/nova/+bug/1904179 But I would like to get help from others to start proposing this doc.
Dan started the documentation work here https://review.opendev.org/#/c/763412/
Now, on to the question about the hard-fail patch for compute:
https://review.opendev.org/#/c/762176/
The Nova devs have wanted people to _not_ have db credentials in the config file that nova-compute can see for a "Very Long Time." We have even made some assumptions in the code that rely on these credentials being not set on the compute, which is at least partially why we're having this conversation (again) now.
Despite the fact that nobody *should* be setting these credentials in their config file, we know that at in least some circumstances, they are. TripleO is setting them (always I think?) because puppet-nova does that. OSA has said that they're setting them somewhat incidentally when they do all-in-one deployments. The latter is unlikely to affect any real deployments, but the former definitely _will_, and anyone else that isn't present in this discussion may be including credentials unnecessarily, which we will break when we merge that patch. Thus, I think that, even though it feels long overdue for devs, the most prudent and cautious approach will be to:
ACTION 3: Not merge that hard fail until X
We have the warning, we have the reno. We expect that the deployment tools will be working to eliminate the credentials for the compute config, but merging that will make it an emergency for them. We've waited years at this point, we can wait one more cycle for safety. As Ollie pointed out, we've not been super clear about messaging this, despite it being a well-worn assumption amongst the developers for a long time.
I'm OK not to merge the breaking patch in W.
I've pushed a patch for the X cycle with a big Workflow -1 https://review.opendev.org/#/c/763559/
To aid in smoking out any of the jobs or configs that deployment tools may still have which will break when we merge that patch, we should also consider:
ACTION 4: A workaround flag to opt-in to the strict checking
Basically, this would be a one or two-cycle workaround, which when enabled, would opt-in to the (above) behavior of causing compute to fail on startup if db credentials are present. This would allow the deployment tool maintainers, as well as people responsible for CI jobs to smoke test the new behavior early, before we merge it and cause an emergency for them. We can set this as on by default in our devstack jobs to signal that it is good with the new behavior, and also encourage the other deployment projects to set it as well once they're generating clean configs for their nova-computes. Once we merge the patch to actually fail all the time, we can remove this workaround config, according to the original intent of the workarounds group, that those flags are short-lived.
We could do this by altering gibi's patch to only fail if the workaround is enabled, and then follow up in X by removing the workaround check.
I will make the suggested changes in my patch.
I've updated the patch with the workaround config option: https://review.opendev.org/#/c/762176 Cheers, gibi