[openstack-dev] [TripleO] os-refresh-config run frequency

Clint Byrum clint at fewbar.com
Mon Jul 21 00:58:08 UTC 2014


Excerpts from Dan Prince's message of 2014-07-20 11:51:27 -0700:
> On Thu, 2014-07-17 at 15:54 +0100, Michael Kerrin wrote:
> > On Thursday 26 June 2014 12:20:30 Clint Byrum wrote:
> > 
> > > Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26
> > 04:13:31 -0700:
> > 
> > > > Hi all,
> > 
> > > > 
> > 
> > > > I've been working more and more with TripleO recently and whilst
> > it does
> > 
> > > > seem to solve a number of problems well, I have found a couple of
> > 
> > > > idiosyncrasies that I feel would be easy to address.
> > 
> > > > 
> > 
> > > > My primary concern lies in the fact that os-refresh-config does
> > not run on
> > 
> > > > every boot/reboot of a system. Surely a reboot *is* a
> > configuration
> > 
> > > > change and therefore we should ensure that the box has come up in
> > the
> > 
> > > > expected state with the correct config?
> > 
> > > > 
> > 
> > > > This is easily fixed through the addition of an "@reboot" entry in
> > 
> > > > /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c
> > to run
> > 
> > > > as a service.
> > 
> > > > 
> > 
> > > > My secondary concern is that through not running os-refresh-config
> > on a
> > 
> > > > regular basis by default (i.e. every 15 minutes or something in
> > the same
> > 
> > > > style as chef/cfengine/puppet), we leave ourselves exposed to
> > someone
> > 
> > > > trying to make a "quick fix" to a production node and taking that
> > node
> > 
> > > > offline the next time it reboots because the config was still left
> > as
> > 
> > > > broken owing to a lack of updates to HEAT (I'm thinking a "quick
> > change"
> > 
> > > > to allow root access via SSH during a major incident that is then
> > left
> > 
> > > > unchanged for months because no-one updated HEAT).
> > 
> > > > 
> > 
> > > > There are a number of options to fix this including Modifying
> > 
> > > > os-collect-config to auto-run os-refresh-config on a regular basis
> > or
> > 
> > > > setting os-refresh-config to be its own service running via
> > upstart or
> > 
> > > > similar that triggers every 15 minutes
> > 
> > > > 
> > 
> > > > I'm sure there are other solutions to these problems, however I
> > know from
> > 
> > > > experience that claiming this is solved through "education of
> > users" or
> > 
> > > > (more severely!) via HR is not a sensible approach to take as by
> > the time
> > 
> > > > you realise that your configuration has been changed for the last
> > 24
> > 
> > > > hours it's often too late!
> > 
> > > So I see two problems highlighted above.
> > 
> > > 
> > 
> > > 1) We don't re-assert ephemeral state set by o-r-c scripts. You're
> > right,
> > 
> > > and we've been talking about it for a while. The right thing to do
> > is
> > 
> > > have os-collect-config re-run its command on boot. I don't think a
> > cron
> > 
> > > job is the right way to go, we should just have a file in /var/run
> > that
> > 
> > > is placed there only on a successful run of the command. If that
> > file
> > 
> > > does not exist, then we run the command.
> > 
> > > 
> > 
> > > I've just opened this bug in response:
> > 
> > > 
> > 
> > > https://bugs.launchpad.net/os-collect-config/+bug/1334804
> > 
> > > 
> > 
> >  
> > 
> > I have been looking into bug #1334804 and I have a review up to
> > resolve it. I want to highlight something.
> > 
> >  
> > 
> > Currently on a reboot we start all services via upstart (on debian
> > anyways) and there have been quite a lot of issues around this -
> > missing upstart scripts and timing issues. I don't know the issues on
> > fedora.
> > 
> >  
> > 
> > So with a fix to #1334804, on a reboot upstart will start all the
> > services first (with potentially out-of-date configuration), then
> > o-c-c will start o-r-c and will now configure all services and restart
> > them or start them if upstart isn't configured properly.
> > 
> >  
> > 
> > I would like to turn off all boot scripts for services we configure
> > and leave all this to o-r-c. I think this will simplify things and put
> > us in control of starting services. I believe that it will also narrow
> > the gap between fedora and debian or debian and debian so what works
> > on one should work on the other and make it easier for developers.
> 
> I'm not sold on this approach. At the very least I think we want to make
> this optional because not all deployments may want to have o-r-c be the
> central service starting agent. So I'm opposed to this being our (only!)
> default...
> 

I felt this way too. However, I'm open to it because I am worried that
it is a bit idealistic without much justification for being so.

We know o-r-c will be there, and really must be there. We're already
saying it needs to run to assert ephemeral state, and one thing ephemeral
is "things started".

Now, we can, and maybe even should, take a hard line long term that
o-r-c does not do this. That it stores everything in system level
configs that are started in the normal system boot. I _want_ this to
be the case. But thus far, we've failed to assert that and things have
occasionally been very broken on reboot. Short of forcing a reboot in
every CI run, we're going to have trouble detecting this.

So, I think we have two options:

1) O-r-c doing the asserting, with which we can more or less predict
that subsequent boots will work in the same manner as the first boot.

2) Reboot in CI.

I would vote for 2, as it probably won't add much time and will test
system start up.

> The job of o-r-c in this regard is to assert state... which to me means
> making sure that a service is configured correctly (config files, set to
> start on boot, and initially started). Requiring o-r-c to be the service
> starting agent (always) is beyond the scope of the o-r-c tool.
> 
> If people want to use it in that mode I think having an *option* to do
> this is fine. I don't think it should be required though. Furthermore I
> don't think we should get into the habit of writing our elements in such
> a matter that things no longer start on boot without o-r-c in the mix.
> 

I don't think We need an option. Options are for real incompatible
differences of operation, like "I want to run in ultra-secure mode and
that breaks stuff that I don't care about so I turn those things off"
or "I want to use packages because my business and support model is
built around it."  Those are real, legitimate differences which we _do_
need options for.

We need to clearly state a design principle, and we need to ensure that
our CI tests the mechanism by which we do these things.

> I do think we can solve these problems. But taking a hardwired
> prescriptive approach is not good here...
> 

It's just one option, and not the best one. I am quite confident that you
all will figure out how to test reboots and do that. And then we'll all
feel better about trusting the system to start services on a cold boot.



More information about the OpenStack-dev mailing list