[nova][tripleo][rpm-packaging][kolla][puppet][debian][osa] Nova enforces that no DB credentials are allowed for the nova-compute service

Dan Smith dms at danplanet.com
Wed Nov 18 19:24:00 UTC 2020


> There was a long discussion on #openstack-nova today[3] around this
> topic. So you can find more detailed reasoning there[3].

I just had a long voice chat with Ollie about this, trying to understand
some of the challenges that still exist with some of the things we've
discussed and the proposed forward steps.

There are lots of things to clarify, I think, and we could honestly
probably use a wider voice chat among the people that need to work on
this. However, I'll try here.

First off, I want to address the "do it like devstack" recommendation,
and the subsequent suggestion of standardizing on something like a
"nova-db.conf" file arrangement to keep the db credentials in one
place. As Ollie has pointed out, devstack doesn't support all the
various deployment arrangements (such as "one metadata api per cell")
and thus doesn't have a prescription for how to arrange the config files
in those scenarios. Specifically, an all-in-one deployment that includes
multiple API services with different configs would currently have to hack
around the hardcoded nova.conf file with the environment variable that
allows them to specify a directory other than /etc/nova.conf. Devstack
doesn't have to do this because it doesn't support that arrangement of
services, but if it did, it would have to.

So, I wanted to clarify something that came out of the IRC discussion,
which is "do it like devstack". When we say that, or at least when *I*
said it, we meant "have different config files for each service so that
you can get the config elements appropriate for each service set
correctly." That doesn't mean "nova-compute should point to
/etc/nova/nova-cpu.conf because that's what devstack does". Especially
in a containerized setup, I would expect everything to use
/etc/nova/nova.conf, since those are namespaced per-service because of
the containerization. Further, just because devstack doesn't run a
metadata per cell (or any other such optional arrangement), obviously
that doesn't mean you can't.

That leads me to the first action item I think we need:

ACTION 1: Make the wsgi app able to use something other than nova.conf

All of our other services support a --config-file flag, and I don't see
why we wouldn't allow this if it's necessary for deployments. In the
pure API arrangement, it shouldn't really be necessary to change this,
but clearly you may need to have multiple API workers with different
configs, as is the case with metadata-per-cell, or even potentially some
admin-focused private API endpoint. If it makes it easier for deployment
tools to start the API workers with a specific config file, we should
let them do that. You can already work around that hard-coded filename
by setting OS_NOVA_CONFIG_DIR to something other than /etc/nova, but we
shouldn't require them to create directories just to have separate
configs.

The next item is related:

ACTION 2: We need to document a minimal viable config for each service

Ollie points out that, obviously, handing the deployer a config
reference with 1.21 bajillion config options does not clearly indicate
which services need which thing, and especially, which things are
_not_allowed_ to be set for a service (such as db credentials on the
compute). We can't feasibly audit every config option we have, but it
would be very helpful to have a reference that shows what a completely
minimal configuration for each service looks like. We could do that by
tagging services per config options (which I suggested earlier, and
would still be good) but I think maybe a standalone document would be
easier for people to follow.

Now, on to the question about the hard-fail patch for compute:

  https://review.opendev.org/#/c/762176/

The Nova devs have wanted people to _not_ have db credentials in the
config file that nova-compute can see for a "Very Long Time." We have
even made some assumptions in the code that rely on these credentials
being not set on the compute, which is at least partially why we're
having this conversation (again) now.

Despite the fact that nobody *should* be setting these credentials in
their config file, we know that at in least some circumstances, they
are. TripleO is setting them (always I think?) because puppet-nova does
that. OSA has said that they're setting them somewhat incidentally when
they do all-in-one deployments. The latter is unlikely to affect any
real deployments, but the former definitely _will_, and anyone else that
isn't present in this discussion may be including credentials
unnecessarily, which we will break when we merge that patch. Thus, I
think that, even though it feels long overdue for devs, the most prudent
and cautious approach will be to:

ACTION 3: Not merge that hard fail until X

We have the warning, we have the reno. We expect that the deployment
tools will be working to eliminate the credentials for the compute
config, but merging that will make it an emergency for them. We've
waited years at this point, we can wait one more cycle for safety. As
Ollie pointed out, we've not been super clear about messaging this,
despite it being a well-worn assumption amongst the developers for a
long time.

To aid in smoking out any of the jobs or configs that deployment tools
may still have which will break when we merge that patch, we should also
consider:

ACTION 4: A workaround flag to opt-in to the strict checking

Basically, this would be a one or two-cycle workaround, which when
enabled, would opt-in to the (above) behavior of causing compute to fail
on startup if db credentials are present. This would allow the
deployment tool maintainers, as well as people responsible for CI jobs
to smoke test the new behavior early, before we merge it and cause an
emergency for them. We can set this as on by default in our devstack
jobs to signal that it is good with the new behavior, and also encourage
the other deployment projects to set it as well once they're generating
clean configs for their nova-computes. Once we merge the patch to
actually fail all the time, we can remove this workaround config,
according to the original intent of the workarounds group, that those
flags are short-lived.

We could do this by altering gibi's patch to only fail if the workaround
is enabled, and then follow up in X by removing the workaround check.

So, I know that was a lot of words, but I think if we can agree to the
above four items, we'll have a path forward that doesn't overly burden
any one specific group while still allowing us to chart a path to
getting where we want to be.

I think we need acks from a bunch of groups, probably at least:

- Nova (gibi, stephenfin)
- TripleO (owalsh)
- Debian (zigo)
- Kolla (yoctozepto)
- OSA (noonedeadpunk)
- rdo-ci (jpena)
- puppet-nova (tkajinam)

Are people okay with these action items?

--Dan



More information about the openstack-discuss mailing list