Re: [nova][tripleo][rpm-packaging][kolla][puppet][debian][osa] Nova enforces that no DB credentials are allowed for the nova-compute service

20 Nov 2020


      On Thu, Nov 19, 2020 at 13:07, Balázs Gibizer 
<balazs.gibizer@est.tech> wrote:
...
On Wed, Nov 18, 2020 at 11:24, Dan Smith <dms@danplanet.com> wrote:
...
That leads me to the first action item I think we need:
ACTION 1: Make the wsgi app able to use something other than 
nova.conf
All of our other services support a --config-file flag, and I don't 
see
why we wouldn't allow this if it's necessary for deployments. In the
pure API arrangement, it shouldn't really be necessary to change 
this,
but clearly you may need to have multiple API workers with different
configs, as is the case with metadata-per-cell, or even potentially 
some
admin-focused private API endpoint. If it makes it easier for 
deployment
tools to start the API workers with a specific config file, we should
let them do that. You can already work around that hard-coded 
filename
by setting OS_NOVA_CONFIG_DIR to something other than /etc/nova, but 
we
shouldn't require them to create directories just to have separate
configs.
I will look into this change and propose patch a patch.
...
The next item is related:
ACTION 2: We need to document a minimal viable config for each 
service
Ollie points out that, obviously, handing the deployer a config
reference with 1.21 bajillion config options does not clearly 
indicate
which services need which thing, and especially, which things are
_not_allowed_ to be set for a service (such as db credentials on the
compute). We can't feasibly audit every config option we have, but it
would be very helpful to have a reference that shows what a 
completely
minimal configuration for each service looks like. We could do that 
by
tagging services per config options (which I suggested earlier, and
would still be good) but I think maybe a standalone document would be
easier for people to follow.
I've opened a doc bug to track the minimal config documentation work: 
https://bugs.launchpad.net/nova/+bug/1904179
But I would like to get help from others to start proposing this doc.
Dan started the documentation work here 
https://review.opendev.org/#/c/763412/
...
...
Now, on to the question about the hard-fail patch for compute:
https://review.opendev.org/#/c/762176/
The Nova devs have wanted people to _not_ have db credentials in the
config file that nova-compute can see for a "Very Long Time." We have
even made some assumptions in the code that rely on these credentials
being not set on the compute, which is at least partially why we're
having this conversation (again) now.
Despite the fact that nobody *should* be setting these credentials in
their config file, we know that at in least some circumstances, they
are. TripleO is setting them (always I think?) because puppet-nova 
does
that. OSA has said that they're setting them somewhat incidentally 
when
they do all-in-one deployments. The latter is unlikely to affect any
real deployments, but the former definitely _will_, and anyone else 
that
isn't present in this discussion may be including credentials
unnecessarily, which we will break when we merge that patch. Thus, I
think that, even though it feels long overdue for devs, the most 
prudent
and cautious approach will be to:
ACTION 3: Not merge that hard fail until X
We have the warning, we have the reno. We expect that the deployment
tools will be working to eliminate the credentials for the compute
config, but merging that will make it an emergency for them. We've
waited years at this point, we can wait one more cycle for safety. As
Ollie pointed out, we've not been super clear about messaging this,
despite it being a well-worn assumption amongst the developers for a
long time.
I'm OK not to merge the breaking patch in W.
I've pushed a patch for the X cycle with a big Workflow -1 
https://review.opendev.org/#/c/763559/
...
...
To aid in smoking out any of the jobs or configs that deployment 
tools
may still have which will break when we merge that patch, we should 
also
consider:
ACTION 4: A workaround flag to opt-in to the strict checking
Basically, this would be a one or two-cycle workaround, which when
enabled, would opt-in to the (above) behavior of causing compute to 
fail
on startup if db credentials are present. This would allow the
deployment tool maintainers, as well as people responsible for CI 
jobs
to smoke test the new behavior early, before we merge it and cause an
emergency for them. We can set this as on by default in our devstack
jobs to signal that it is good with the new behavior, and also 
encourage
the other deployment projects to set it as well once they're 
generating
clean configs for their nova-computes. Once we merge the patch to
actually fail all the time, we can remove this workaround config,
according to the original intent of the workarounds group, that those
flags are short-lived.
We could do this by altering gibi's patch to only fail if the 
workaround
is enabled, and then follow up in X by removing the workaround check.
I will make the suggested changes in my patch.
I've updated the patch with the workaround config option: 
https://review.opendev.org/#/c/762176

Cheers,
gibi