> So, I wanted to clarify something that came out of the IRC discussion,
> which is "do it like devstack". When we say that, or at least when *I*
> said it, we meant "have different config files for each service so that
> you can get the config elements appropriate for each service set

> correctly."

I have one concern with the way we generate the default config, if we expect separated config files.

Currently in most of the distros we use oslo-config-generator to render the default conf file.

Since oslo-config-generator rendors all config options implemented in nova, we don't have any

good way to generate the sample config file for each service.

We didn't care which file we put the option since all services use the single file, but now we should always

refer to that documentation unless we implement generation of service-specific files or we write down

the services requiring that option in parameter description.

# We have the same problem with nova-db.conf approach.

> Despite the fact that nobody *should* be setting these credentials in
> their config file, we know that at in least some circumstances, they
> are. TripleO is setting them (always I think?) because puppet-nova does
> that. ...

The issue with TripleO was fixed by the change in puppet-nova[1], and nova.conf for

nova-compute service no longer includes database credentials.

We can backport the fix is needed, as per discussion with owalsh on the previous mails.

[1] https://review.opendev.org/#/c/755689/

But from puppet-nova's perspective the sandalone deployment or ironic deployment(*1)

still needs some fix to address the separate config files, because all processes run on host,

not on container, and refer to the same default conf file, nova.conf.

Since puppet-nova follows the way how distro packaging provides the default config files,

we need to fix current implementation in puppet-nova after each distro packages provide

the updated version with new config file structure.

(*1)

In usual deployment with ironic, we have nova-compute running on the controller nodes,

where nova-api is also running, IIUC.

On Thu, Nov 19, 2020 at 4:27 AM Dan Smith <dms@danplanet.com> wrote:

> There was a long discussion on #openstack-nova today[3] around this
> topic. So you can find more detailed reasoning there[3].

I just had a long voice chat with Ollie about this, trying to understand
some of the challenges that still exist with some of the things we've
discussed and the proposed forward steps.

There are lots of things to clarify, I think, and we could honestly
probably use a wider voice chat among the people that need to work on
this. However, I'll try here.

First off, I want to address the "do it like devstack" recommendation,
and the subsequent suggestion of standardizing on something like a
"nova-db.conf" file arrangement to keep the db credentials in one
place. As Ollie has pointed out, devstack doesn't support all the
various deployment arrangements (such as "one metadata api per cell")
and thus doesn't have a prescription for how to arrange the config files
in those scenarios. Specifically, an all-in-one deployment that includes
multiple API services with different configs would currently have to hack
around the hardcoded nova.conf file with the environment variable that
allows them to specify a directory other than /etc/nova.conf. Devstack
doesn't have to do this because it doesn't support that arrangement of
services, but if it did, it would have to.

So, I wanted to clarify something that came out of the IRC discussion,
which is "do it like devstack". When we say that, or at least when *I*
said it, we meant "have different config files for each service so that
you can get the config elements appropriate for each service set
correctly." That doesn't mean "nova-compute should point to
/etc/nova/nova-cpu.conf because that's what devstack does". Especially
in a containerized setup, I would expect everything to use
/etc/nova/nova.conf, since those are namespaced per-service because of
the containerization. Further, just because devstack doesn't run a
metadata per cell (or any other such optional arrangement), obviously
that doesn't mean you can't.

That leads me to the first action item I think we need:

ACTION 1: Make the wsgi app able to use something other than nova.conf

All of our other services support a --config-file flag, and I don't see
why we wouldn't allow this if it's necessary for deployments. In the
pure API arrangement, it shouldn't really be necessary to change this,
but clearly you may need to have multiple API workers with different
configs, as is the case with metadata-per-cell, or even potentially some
admin-focused private API endpoint. If it makes it easier for deployment
tools to start the API workers with a specific config file, we should
let them do that. You can already work around that hard-coded filename
by setting OS_NOVA_CONFIG_DIR to something other than /etc/nova, but we
shouldn't require them to create directories just to have separate
configs.

The next item is related:

ACTION 2: We need to document a minimal viable config for each service

Ollie points out that, obviously, handing the deployer a config
reference with 1.21 bajillion config options does not clearly indicate
which services need which thing, and especially, which things are
_not_allowed_ to be set for a service (such as db credentials on the
compute). We can't feasibly audit every config option we have, but it
would be very helpful to have a reference that shows what a completely
minimal configuration for each service looks like. We could do that by
tagging services per config options (which I suggested earlier, and
would still be good) but I think maybe a standalone document would be
easier for people to follow.

Now, on to the question about the hard-fail patch for compute:

https://review.opendev.org/#/c/762176/

The Nova devs have wanted people to _not_ have db credentials in the
config file that nova-compute can see for a "Very Long Time." We have
even made some assumptions in the code that rely on these credentials
being not set on the compute, which is at least partially why we're
having this conversation (again) now.

Despite the fact that nobody *should* be setting these credentials in
their config file, we know that at in least some circumstances, they
are. TripleO is setting them (always I think?) because puppet-nova does
that. OSA has said that they're setting them somewhat incidentally when
they do all-in-one deployments. The latter is unlikely to affect any
real deployments, but the former definitely _will_, and anyone else that
isn't present in this discussion may be including credentials
unnecessarily, which we will break when we merge that patch. Thus, I
think that, even though it feels long overdue for devs, the most prudent
and cautious approach will be to:

ACTION 3: Not merge that hard fail until X

We have the warning, we have the reno. We expect that the deployment
tools will be working to eliminate the credentials for the compute
config, but merging that will make it an emergency for them. We've
waited years at this point, we can wait one more cycle for safety. As
Ollie pointed out, we've not been super clear about messaging this,
despite it being a well-worn assumption amongst the developers for a
long time.

To aid in smoking out any of the jobs or configs that deployment tools
may still have which will break when we merge that patch, we should also
consider:

ACTION 4: A workaround flag to opt-in to the strict checking

Basically, this would be a one or two-cycle workaround, which when
enabled, would opt-in to the (above) behavior of causing compute to fail
on startup if db credentials are present. This would allow the
deployment tool maintainers, as well as people responsible for CI jobs
to smoke test the new behavior early, before we merge it and cause an
emergency for them. We can set this as on by default in our devstack
jobs to signal that it is good with the new behavior, and also encourage
the other deployment projects to set it as well once they're generating
clean configs for their nova-computes. Once we merge the patch to
actually fail all the time, we can remove this workaround config,
according to the original intent of the workarounds group, that those
flags are short-lived.

We could do this by altering gibi's patch to only fail if the workaround
is enabled, and then follow up in X by removing the workaround check.

So, I know that was a lot of words, but I think if we can agree to the
above four items, we'll have a path forward that doesn't overly burden
any one specific group while still allowing us to chart a path to
getting where we want to be.

I think we need acks from a bunch of groups, probably at least:

- Nova (gibi, stephenfin)
- TripleO (owalsh)
- Debian (zigo)
- Kolla (yoctozepto)
- OSA (noonedeadpunk)
- rdo-ci (jpena)
- puppet-nova (tkajinam)

Are people okay with these action items?

--Dan