[openstack-dev] [grenade] upgrades vs rootwrap
Matt Riedemann
mriedem at linux.vnet.ibm.com
Wed Jul 6 15:34:49 UTC 2016
On 6/27/2016 6:24 AM, Sean Dague wrote:
> On 06/26/2016 10:02 PM, Angus Lees wrote:
>> On Fri, 24 Jun 2016 at 20:48 Sean Dague <sean at dague.net
>> <mailto:sean at dague.net>> wrote:
>>
>> On 06/24/2016 05:12 AM, Thierry Carrez wrote:
>> > I'm adding Possibility (0): change Grenade so that rootwrap
>> filters from
>> > N+1 are put in place before you upgrade.
>>
>> If you do that as general course what you are saying is that every
>> installer and install process includes overwriting all of rootwrap
>> before every upgrade. Keep in mind we do upstream upgrade as offline,
>> which means that we've fully shut down the cloud. This would remove the
>> testing requirement that rootwrap configs were even compatible between N
>> and N+1. And you think this is theoretical, you should see the patches
>> I've gotten over the years to grenade because people didn't see an issue
>> with that at all. :)
>>
>> I do get that people don't like the constraints we've self imposed, but
>> we've done that for very good reasons. The #1 complaint from operators,
>> for ever, has been the pain and danger of upgrading. That's why we are
>> still trademarking new Juno clouds. When you upgrade Apache, you don't
>> have to change your config files.
>>
>>
>> In case it got lost, I'm 100% on board with making upgrades safe and
>> straightforward, and I understand that grenade is merely a tool to help
>> us test ourselves against our process and not an enemy to be worked
>> around. I'm an ops guy proud and true and hate you all for making
>> openstack hard to upgrade in the first place :P
>>
>> Rootwrap configs need to be updated in line with new rootwrap-using code
>> - that's just the way the rootwrap security mechanism works, since the
>> security "trust" flows from the root-installed rootwrap config files.
>>
>> I would like to clarify what our self-imposed upgrade rules are so that
>> I can design code within those constraints, and no-one is answering my
>> question so I'm just getting more confused as this thread progresses...
>>
>> ***
>> What are we trying to impose on ourselves for upgrades for the present
>> and near future (ie: while rootwrap is still a thing)?
>> ***
>>
>> A. Sean says above that we do "offline" upgrades, by which I _think_ he
>> means a host-by-host (or even global?) "turn everything (on the same
>> host/container) off, upgrade all files on disk for that host/container,
>> turn it all back on again". If this is the model, then we can trivially
>> update rootwrap files during the "upgrade" step, and I don't see any
>> reason why we need to discuss anything further - except how we implement
>> this in grenade.
>>
>> B. We need to support a mix of old + new code running on the same
>> host/container, running against the same config files (presumably
>> because we're updating service-by-service, or want to minimise the
>> service-unavailability during upgrades to literally just a process
>> restart). So we need to think about how and when we stage config vs
>> code updates, and make sure that any overlap is appropriately allowed
>> for (expand-contract, etc).
>>
>> C. We would like to just never upgrade rootwrap (or other config) files
>> ever again (implying a freeze in as_root command lines, effective ~a
>> year ago). Any config update is an exception dealt with through
>> case-by-case process and release notes.
>>
>>
>> I feel like the grenade check currently implements (B) with a 6 month
>> lead time on config changes, but the "theory of upgrade" doc and our
>> verbal policy might actually be (C) (see this thread, eg), and Sean
>> above introduced the phrase "offline" which threw me completely into
>> thinking maybe we're aiming for (A). You can see why I'm looking for
>> clarification ;)
>
> Ok, there is theory of what we are striving for, and there is what is
> viable to test consistently.
>
> The thing we are shooting for is making the code Continuously
> Deployable. Which means the upgrade process should be "pip install -U
> $foo && $foo-manage db-sync" on the API surfaces and "pip install -U
> $foo; service restart" on everything else.
>
> Logic we can put into the python install process is common logic shared
> by all deployment tools, and we can encode it in there. So all
> installers just get it.
>
> The challenge is there is no facility for config file management in
> python native packaging. Which means that software which *depends* on
> config files for new or even working features now moves from the camp of
> CDable to manual upgrade needed. What you need to do is in releasenotes,
> not in code that's shipped with your software. Release notes are not
> scriptable.
>
> So, we've said, doing that has to be the exception and not the rule.
> It's also the same reasoning behind our deprecation phase for all config
> options. Things move from working (in N), to working with warnings (in
> N+1), to not working (in N+2). Which allows people to CD across this
> boundary, and do config file fixing in their Config Management tools
> *post* upgrade.
rootwrap filters aren't config options, but I get the feeling we're
shoe-horning grenade to treat them as such.
I get why grenade tests how it does so we give a window for
configuration option deprecation. That's great and useful.
What I'm struggling with, and assuming others on this thread are, is the
difference with rootwrap filters, which are going to be required to be
in place for the code that relies on them to work.
That's not the same for config options, i.e. my nova.conf from mitaka
doesn't need new options from newton for my newton code to work, because
if the option isn't in nova.conf explicitly, my newton code gets the
defaults from oslo.config because it's in the code.
That doesn't work for rootwrap filters. So it really seems that putting
the newton rootwrap filters in place before running the newton code
makes the most sense, at least to me.
The problem I could see us running into is if in newton we dropped some
no longer used code which also was the last/only thing using a given
rootwrap filter, and we dropped that too. But maybe something that
wasn't upgraded (so mitaka code) on that same host is still relying on
that rootwrap filter. Maybe that's not possible though since the only
down-level thing in nova that we support is computes, and those would be
separate nodes. If it was single-node, deploying the controller code
would also update the rootwrap filters I'd think (if we went that route).
Am I missing something else here?
>
> Our testing, like all testing, is a trade off for what we could do
> consistently, and feel confident of the results. That's grenade. We need
> to operate on an all in one node, because that's what we have. We're
> using system level installs, because > 50% of our user base does. This
> does mean all of everything is getting upgraded all at once in the
> normal pip install -U flow, because the moment you start replacing
> system level libraries, bets are kind of off for services that are still
> running.
>
> But, if we exploit every weakness of the testing to figure out exactly
> the minimum we need to make the testing pass, we stop trying to do the
> thing we set out. Painless upgrades.
>
> The theory that rootwrap rules have to be inspected manually and
> adjusted by every deployer during upgrade seems... odd. It's like if you
> tried to upgrade firefox, and it wouldn't start until you adjusted your
> profile manually.
>
> So we are not aiming for A, we're actually aiming much higher. But
> testing, consistently, that much higher bar is a thing we can't easily
> do. So the structure of the testing for our offline upgrades, with the
> policy rules about what we should not change, is our check and balance
> for getting to properly seemless fully online upgrades.
>
> -Sean
>
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list