<div dir="ltr">Ok, thanks for the in-depth explanation.<div><br></div><div>My take away is that <span style="line-height:1.5">we need to file any rootwrap updates as exceptions for now (so releasenotes and grenade scripts).</span></div><div><br></div><div> - Gus</div><div><br><div class="gmail_quote"><div dir="ltr">On Mon, 27 Jun 2016 at 21:25 Sean Dague <<a href="mailto:sean@dague.net">sean@dague.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 06/26/2016 10:02 PM, Angus Lees wrote:<br>

> On Fri, 24 Jun 2016 at 20:48 Sean Dague <<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a><br>

> <mailto:<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>>> wrote:<br>

><br>

>     On 06/24/2016 05:12 AM, Thierry Carrez wrote:<br>

>     > I'm adding Possibility (0): change Grenade so that rootwrap<br>

>     filters from<br>

>     > N+1 are put in place before you upgrade.<br>

><br>

>     If you do that as general course what you are saying is that every<br>

>     installer and install process includes overwriting all of rootwrap<br>

>     before every upgrade. Keep in mind we do upstream upgrade as offline,<br>

>     which means that we've fully shut down the cloud. This would remove the<br>

>     testing requirement that rootwrap configs were even compatible between N<br>

>     and N+1. And you think this is theoretical, you should see the patches<br>

>     I've gotten over the years to grenade because people didn't see an issue<br>

>     with that at all. :)<br>

><br>

>     I do get that people don't like the constraints we've self imposed, but<br>

>     we've done that for very good reasons. The #1 complaint from operators,<br>

>     for ever, has been the pain and danger of upgrading. That's why we are<br>

>     still trademarking new Juno clouds. When you upgrade Apache, you don't<br>

>     have to change your config files.<br>

><br>

><br>

> In case it got lost, I'm 100% on board with making upgrades safe and<br>

> straightforward, and I understand that grenade is merely a tool to help<br>

> us test ourselves against our process and not an enemy to be worked<br>

> around.  I'm an ops guy proud and true and hate you all for making<br>

> openstack hard to upgrade in the first place :P<br>

><br>

> Rootwrap configs need to be updated in line with new rootwrap-using code<br>

> - that's just the way the rootwrap security mechanism works, since the<br>

> security "trust" flows from the root-installed rootwrap config files.<br>

><br>

> I would like to clarify what our self-imposed upgrade rules are so that<br>

> I can design code within those constraints, and no-one is answering my<br>

> question so I'm just getting more confused as this thread progresses...<br>

><br>

> ***<br>

> What are we trying to impose on ourselves for upgrades for the present<br>

> and near future (ie: while rootwrap is still a thing)?<br>

> ***<br>

><br>

> A. Sean says above that we do "offline" upgrades, by which I _think_ he<br>

> means a host-by-host (or even global?) "turn everything (on the same<br>

> host/container) off, upgrade all files on disk for that host/container,<br>

> turn it all back on again".  If this is the model, then we can trivially<br>

> update rootwrap files during the "upgrade" step, and I don't see any<br>

> reason why we need to discuss anything further - except how we implement<br>

> this in grenade.<br>

><br>

> B. We need to support a mix of old + new code running on the same<br>

> host/container, running against the same config files (presumably<br>

> because we're updating service-by-service, or want to minimise the<br>

> service-unavailability during upgrades to literally just a process<br>

> restart).  So we need to think about how and when we stage config vs<br>

> code updates, and make sure that any overlap is appropriately allowed<br>

> for (expand-contract, etc).<br>

><br>

> C. We would like to just never upgrade rootwrap (or other config) files<br>

> ever again (implying a freeze in as_root command lines, effective ~a<br>

> year ago).  Any config update is an exception dealt with through<br>

> case-by-case process and release notes.<br>

><br>

><br>

> I feel like the grenade check currently implements (B) with a 6 month<br>

> lead time on config changes, but the "theory of upgrade" doc and our<br>

> verbal policy might actually be (C) (see this thread, eg), and Sean<br>

> above introduced the phrase "offline" which threw me completely into<br>

> thinking maybe we're aiming for (A).  You can see why I'm looking for<br>

> clarification  ;)<br>

<br>

Ok, there is theory of what we are striving for, and there is what is<br>

viable to test consistently.<br>

<br>

The thing we are shooting for is making the code Continuously<br>

Deployable. Which means the upgrade process should be "pip install -U<br>

$foo && $foo-manage db-sync" on the API surfaces and "pip install -U<br>

$foo; service restart" on everything else.<br>

<br>

Logic we can put into the python install process is common logic shared<br>

by all deployment tools, and we can encode it in there. So all<br>

installers just get it.<br>

<br>

The challenge is there is no facility for config file management in<br>

python native packaging. Which means that software which *depends* on<br>

config files for new or even working features now moves from the camp of<br>

CDable to manual upgrade needed. What you need to do is in releasenotes,<br>

not in code that's shipped with your software. Release notes are not<br>

scriptable.<br>

<br>

So, we've said, doing that has to be the exception and not the rule.<br>

It's also the same reasoning behind our deprecation phase for all config<br>

options. Things move from working (in N), to working with warnings (in<br>

N+1), to not working (in N+2). Which allows people to CD across this<br>

boundary, and do config file fixing in their Config Management tools<br>

*post* upgrade.<br>

<br>

Our testing, like all testing, is a trade off for what we could do<br>

consistently, and feel confident of the results. That's grenade. We need<br>

to operate on an all in one node, because that's what we have. We're<br>

using system level installs, because > 50% of our user base does. This<br>

does mean all of everything is getting upgraded all at once in the<br>

normal pip install -U flow, because the moment you start replacing<br>

system level libraries, bets are kind of off for services that are still<br>

running.<br>

<br>

But, if we exploit every weakness of the testing to figure out exactly<br>

the minimum we need to make the testing pass, we stop trying to do the<br>

thing we set out. Painless upgrades.<br>

<br>

The theory that rootwrap rules have to be inspected manually and<br>

adjusted by every deployer during upgrade seems... odd. It's like if you<br>

tried to upgrade firefox, and it wouldn't start until you adjusted your<br>

profile manually.<br>

<br>

So we are not aiming for A, we're actually aiming much higher. But<br>

testing, consistently, that much higher bar is a thing we can't easily<br>

do. So the structure of the testing for our offline upgrades, with the<br>

policy rules about what we should not change, is our check and balance<br>

for getting to properly seemless fully online upgrades.<br>

<br>

        -Sean<br>

<br>

--<br>

Sean Dague<br>

<a href="http://dague.net" rel="noreferrer" target="_blank">http://dague.net</a><br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

--<br>

Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.<a href="http://www.mailguard.com.au/mg" rel="noreferrer" target="_blank">http://www.mailguard.com.au/mg</a><br>

Click here to report this message as spam:<br>

<a href="https://console.mailguard.com.au/ras/1OJ137Hmex/7hJ0sxibjR6Z5nVC229GOK/0.22" rel="noreferrer" target="_blank">https://console.mailguard.com.au/ras/1OJ137Hmex/7hJ0sxibjR6Z5nVC229GOK/0.22</a><br>

<br>

</blockquote></div></div></div>