[openstack-dev] [TripleO] os-refresh-config run frequency

Clint Byrum clint at fewbar.com
Thu Jun 26 19:20:30 UTC 2014


Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 -0700:
> Hi all,
> 
> I've been working more and more with TripleO recently and whilst it does seem to solve a number of problems well, I have found a couple of idiosyncrasies that I feel would be easy to address.
> 
> My primary concern lies in the fact that os-refresh-config does not run on every boot/reboot of a system.  Surely a reboot *is* a configuration change and therefore we should ensure that the box has come up in the expected state with the correct config?
> 
> This is easily fixed through the addition of an "@reboot" entry in /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run as a service.
> 
> My secondary concern is that through not running os-refresh-config on a regular basis by default (i.e. every 15 minutes or something in the same style as chef/cfengine/puppet), we leave ourselves exposed to someone trying to make a "quick fix" to a production node and taking that node offline the next time it reboots because the config was still left as broken owing to a lack of updates to HEAT (I'm thinking a "quick change" to allow root access via SSH during a major incident that is then left unchanged for months because no-one updated HEAT).
> 
> There are a number of options to fix this including Modifying os-collect-config to auto-run os-refresh-config on a regular basis or setting os-refresh-config to be its own service running via upstart or similar that triggers every 15 minutes
> 
> I'm sure there are other solutions to these problems, however I know from experience that claiming this is solved through "education of users" or (more severely!) via HR is not a sensible approach to take as by the time you realise that your configuration has been changed for the last 24 hours it's often too late!
> 

So I see two problems highlighted above. 

1) We don't re-assert ephemeral state set by o-r-c scripts. You're right,
and we've been talking about it for a while. The right thing to do is
have os-collect-config re-run its command on boot. I don't think a cron
job is the right way to go, we should just have a file in /var/run that
is placed there only on a successful run of the command. If that file
does not exist, then we run the command.

I've just opened this bug in response:

https://bugs.launchpad.net/os-collect-config/+bug/1334804

2) We don't re-assert any state on a regular basis.

So one reason we haven't focused on this, is that we have a stretch goal
of running with a readonly root partition. It's gotten lost in a lot of
the craziness of "just get it working", but with rebuilds blowing away
root now, leading to anything not on the state drive (/mnt currently),
there's a good chance that this will work relatively well.

Now, since people get root, they can always override the readonly root
and make changes. <golem>we hates thiss!</golem>.

I'm open to ideas, however, os-refresh-config is definitely not the
place to solve this. It is intended as a non-resident command to be
called when it is time to assert state. os-collect-config is intended
to gather configurations, and expose them to a command that it runs,
and thus should be the mechanism by which os-refresh-config is run.

I'd like to keep this conversation separate from one in which we discuss
more mechanisms to make os-refresh-config robust. There are a bunch of
things we can do, but I think we should focus just on "how do we
re-assert state?".

Because we're able to say right now that it is only for running when
config changes, we can wave our hands and say it's ok that we restart
everything on every run. As Jan alluded to, that won't work so well if
we run it every 20 minutes.

So, I wonder if we can introduce a config version into
os-collect-config.

Basically os-collect-config would keep a version along with its cache.
Whenever a new version is detected, os-collect-config would set a value
in the environment that informs the command "this is a new version of
config". From that, scripts can do things like this:

if [ -n "$OS_CONFIG_NEW_VERSION" ] ; then
  service X restart
else
  if !service X status ; then service X start
fi

This would lay the groundwork for future abilities to compare old/new so
we can take shortcuts by diffing the two config versions. For instance
if we look at old vs. new and we don't see any of the keys we care about
changed, we can skip restarting.



More information about the OpenStack-dev mailing list