[openstack-dev] [TripleO] os-refresh-config run frequency

Clint Byrum clint at fewbar.com
Thu Jul 17 16:45:30 UTC 2014


Excerpts from Michael Kerrin's message of 2014-07-17 07:54:26 -0700:
> On Thursday 26 June 2014 12:20:30 Clint Byrum wrote:
> > Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 
> -0700:
> > > Hi all,
> > > 
> > > I've been working more and more with TripleO recently and whilst it does
> > > seem to solve a number of problems well, I have found a couple of
> > > idiosyncrasies that I feel would be easy to address.
> > > 
> > > My primary concern lies in the fact that os-refresh-config does not run on
> > > every boot/reboot of a system.  Surely a reboot *is* a configuration
> > > change and therefore we should ensure that the box has come up in the
> > > expected state with the correct config?
> > > 
> > > This is easily fixed through the addition of an "@reboot" entry in
> > > /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run
> > > as a service.
> > > 
> > > My secondary concern is that through not running os-refresh-config on a
> > > regular basis by default (i.e. every 15 minutes or something in the same
> > > style as chef/cfengine/puppet), we leave ourselves exposed to someone
> > > trying to make a "quick fix" to a production node and taking that node
> > > offline the next time it reboots because the config was still left as
> > > broken owing to a lack of updates to HEAT (I'm thinking a "quick change"
> > > to allow root access via SSH during a major incident that is then left
> > > unchanged for months because no-one updated HEAT).
> > > 
> > > There are a number of options to fix this including Modifying
> > > os-collect-config to auto-run os-refresh-config on a regular basis or
> > > setting os-refresh-config to be its own service running via upstart or
> > > similar that triggers every 15 minutes
> > > 
> > > I'm sure there are other solutions to these problems, however I know from
> > > experience that claiming this is solved through "education of users" or
> > > (more severely!) via HR is not a sensible approach to take as by the time
> > > you realise that your configuration has been changed for the last 24
> > > hours it's often too late!
> > So I see two problems highlighted above.
> > 
> > 1) We don't re-assert ephemeral state set by o-r-c scripts. You're right,
> > and we've been talking about it for a while. The right thing to do is
> > have os-collect-config re-run its command on boot. I don't think a cron
> > job is the right way to go, we should just have a file in /var/run that
> > is placed there only on a successful run of the command. If that file
> > does not exist, then we run the command.
> > 
> > I've just opened this bug in response:
> > 
> > https://bugs.launchpad.net/os-collect-config/+bug/1334804
> > 
> 
> I have been looking into bug #1334804 and I have a review up to resolve it. I 
> want to highlight something.
> 
> Currently on a reboot we start all services via upstart (on debian anyways) 
> and there have been quite a lot of issues around this - missing upstart 
> scripts and timing issues. I don't know the issues on fedora.
> 
> So with a fix to #1334804, on a reboot upstart will start all the services 
> first (with potentially out-of-date configuration), then o-c-c will start o-r-
> c and will now configure all services and restart them or start them if 
> upstart isn't configured properly.
> 
> I would like to turn off all boot scripts for services we configure and leave 
> all this to o-r-c. I think this will simplify things and put us in control of 
> starting services. I believe that it will also narrow the gap between fedora 
> and debian or debian and debian so what works on one should work on the other 
> and make it easier for developers.

Agreed, and that is actually really simple. I hate to steal your thunder,
but this is the patch:

https://review.openstack.org/107772

> 
> Having the ability to service nova-api stop|start|restart is very handy but 
> this will be a manually thing and I intend to leave that there.
> 
> What do people think and how best do I push this forward. I feel that this 
> leads into the the re-assert-system-state spec but mainly I think this is a 
> bug and doesn't require a spec.
> 
> I will be at the tripleo mid-cycle meetup next and willing to discuss this 
> with anyone interested in this and put together the necessary bits to make 
> this happen.

As I said, it is simple. :) I suggest testing the patch above and adding
anything I missed to it.

Systemd based systems will likely need something different. I'm still
burying my head int he sand and not learning systemd, but perhaps a
follow-up patch from somebody who understands it can make those systems
do the same thing.



More information about the OpenStack-dev mailing list