[openstack-dev] [grenade] upgrades vs rootwrap

Sean Dague sean at dague.net
Mon Jun 27 11:24:29 UTC 2016


On 06/26/2016 10:02 PM, Angus Lees wrote:
> On Fri, 24 Jun 2016 at 20:48 Sean Dague <sean at dague.net
> <mailto:sean at dague.net>> wrote:
> 
>     On 06/24/2016 05:12 AM, Thierry Carrez wrote:
>     > I'm adding Possibility (0): change Grenade so that rootwrap
>     filters from
>     > N+1 are put in place before you upgrade.
> 
>     If you do that as general course what you are saying is that every
>     installer and install process includes overwriting all of rootwrap
>     before every upgrade. Keep in mind we do upstream upgrade as offline,
>     which means that we've fully shut down the cloud. This would remove the
>     testing requirement that rootwrap configs were even compatible between N
>     and N+1. And you think this is theoretical, you should see the patches
>     I've gotten over the years to grenade because people didn't see an issue
>     with that at all. :)
> 
>     I do get that people don't like the constraints we've self imposed, but
>     we've done that for very good reasons. The #1 complaint from operators,
>     for ever, has been the pain and danger of upgrading. That's why we are
>     still trademarking new Juno clouds. When you upgrade Apache, you don't
>     have to change your config files.
> 
> 
> In case it got lost, I'm 100% on board with making upgrades safe and
> straightforward, and I understand that grenade is merely a tool to help
> us test ourselves against our process and not an enemy to be worked
> around.  I'm an ops guy proud and true and hate you all for making
> openstack hard to upgrade in the first place :P
> 
> Rootwrap configs need to be updated in line with new rootwrap-using code
> - that's just the way the rootwrap security mechanism works, since the
> security "trust" flows from the root-installed rootwrap config files.
> 
> I would like to clarify what our self-imposed upgrade rules are so that
> I can design code within those constraints, and no-one is answering my
> question so I'm just getting more confused as this thread progresses...
> 
> ***
> What are we trying to impose on ourselves for upgrades for the present
> and near future (ie: while rootwrap is still a thing)?
> ***
> 
> A. Sean says above that we do "offline" upgrades, by which I _think_ he
> means a host-by-host (or even global?) "turn everything (on the same
> host/container) off, upgrade all files on disk for that host/container,
> turn it all back on again".  If this is the model, then we can trivially
> update rootwrap files during the "upgrade" step, and I don't see any
> reason why we need to discuss anything further - except how we implement
> this in grenade.
> 
> B. We need to support a mix of old + new code running on the same
> host/container, running against the same config files (presumably
> because we're updating service-by-service, or want to minimise the
> service-unavailability during upgrades to literally just a process
> restart).  So we need to think about how and when we stage config vs
> code updates, and make sure that any overlap is appropriately allowed
> for (expand-contract, etc).
> 
> C. We would like to just never upgrade rootwrap (or other config) files
> ever again (implying a freeze in as_root command lines, effective ~a
> year ago).  Any config update is an exception dealt with through
> case-by-case process and release notes.
> 
> 
> I feel like the grenade check currently implements (B) with a 6 month
> lead time on config changes, but the "theory of upgrade" doc and our
> verbal policy might actually be (C) (see this thread, eg), and Sean
> above introduced the phrase "offline" which threw me completely into
> thinking maybe we're aiming for (A).  You can see why I'm looking for
> clarification  ;)

Ok, there is theory of what we are striving for, and there is what is
viable to test consistently.

The thing we are shooting for is making the code Continuously
Deployable. Which means the upgrade process should be "pip install -U
$foo && $foo-manage db-sync" on the API surfaces and "pip install -U
$foo; service restart" on everything else.

Logic we can put into the python install process is common logic shared
by all deployment tools, and we can encode it in there. So all
installers just get it.

The challenge is there is no facility for config file management in
python native packaging. Which means that software which *depends* on
config files for new or even working features now moves from the camp of
CDable to manual upgrade needed. What you need to do is in releasenotes,
not in code that's shipped with your software. Release notes are not
scriptable.

So, we've said, doing that has to be the exception and not the rule.
It's also the same reasoning behind our deprecation phase for all config
options. Things move from working (in N), to working with warnings (in
N+1), to not working (in N+2). Which allows people to CD across this
boundary, and do config file fixing in their Config Management tools
*post* upgrade.

Our testing, like all testing, is a trade off for what we could do
consistently, and feel confident of the results. That's grenade. We need
to operate on an all in one node, because that's what we have. We're
using system level installs, because > 50% of our user base does. This
does mean all of everything is getting upgraded all at once in the
normal pip install -U flow, because the moment you start replacing
system level libraries, bets are kind of off for services that are still
running.

But, if we exploit every weakness of the testing to figure out exactly
the minimum we need to make the testing pass, we stop trying to do the
thing we set out. Painless upgrades.

The theory that rootwrap rules have to be inspected manually and
adjusted by every deployer during upgrade seems... odd. It's like if you
tried to upgrade firefox, and it wouldn't start until you adjusted your
profile manually.

So we are not aiming for A, we're actually aiming much higher. But
testing, consistently, that much higher bar is a thing we can't easily
do. So the structure of the testing for our offline upgrades, with the
policy rules about what we should not change, is our check and balance
for getting to properly seemless fully online upgrades.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list