<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 26, 2017 at 4:55 AM, Carter, Kevin <span dir="ltr"><<a href="mailto:kevin@cloudnull.com" target="_blank">kevin@cloudnull.com</a>></span> wrote: <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div>Hello Stackers,</div><div><br></div></div></div></blockquote><div> </div><div>Hi Kevin, all, <br></div><div><div><br></div><div>apologies for the very late response here - fwiw I was working at a remote location all of last week and am catching up still. I was not at the PTG or part of the original conversation but this thread && etherpad have been very helpful so thank you very much for sharing. Mostly replying to say 'this is something TripleO/upgrades are interested in too' - obviously not for the P cycle - and some thoughts on how TripleO is doing upgrades today. </div><div><br></div><div>Big +1 to David Simard's point about 'Making N to N+1 upgrades seamless and work well is already challenging</div></div><div>today ' - ++ to that from our experience. Besides anything else, going between versions we've also had to change the workflow itself (docs @ [0] include a link to the composable services spec that explains why the workflow had to change for Newton to Ocata upgrades). The point is we are very much still working towards a seamless upgrades experience - we *are* improving on each release most notably N..O - considering more pre-upgrade validations for example and trying to minimize service downtime. Having said that some more comments inline to the goal of skipping upgrades: </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div>As I'm sure many of you know there was a talk about doing "skip-level"[0] upgrades at the OpenStack Summit which quite a few folks were interested in. Today many of the interested parties got together and talked about doing more of this in a formalized capacity. Essentially we're looking for cloud upgrades with the possibility of skipping releases, ideally enabling an N+3 upgrade. In our opinion it would go a very long way to solving cloud consumer and deployer problems it folks didn't have to deal with an upgrade every six months. While we talked about various issues and some of the current approaches being kicked around we wanted to field our general chat to the rest of the community and request input from folks that may have already fought such a beast. If you've taken on an adventure like this how did you approach it? Did it work? Any known issues, gotchas, or things folks should be generally aware of?</div><div><br></div><div><br></div><div>During our chat today we generally landed on an in-place upgrade with known API service downtime and little (at least as little as possible) data plane downtime. The process discussed was basically:</div><div>a1. Create utility "thing-a-me" (container, venv, etc) which contains the required code to run a service through all of the required upgrades.</div><div>a2. Stop service(s).</div><div>a3. Run migration(s)/upgrade(s) for all releases using the utility "thing-a-me".</div><div>a4. Repeat for all services.</div><div><br></div><div>b1. Once all required migrations are complete run a deployment using the target release.</div><div>b2. Ensure all services are restarted.</div><div>b3. Ensure cloud is functional.</div><div>b4. profit!</div><div><br></div><div>Obviously, there's a lot of hand waving here but such a process is being developed by the OpenStack-Ansible project[1]. Currently, the OSA tooling will allow deployers to upgrade from Juno/Kilo to Newton using Ubuntu 14.04. While this has worked in the lab, it's early in development (YMMV). Also, the tooling is not very general purpose or portable outside of OSA but it could serve as a guide or just a general talking point. Are there other tools out there that solve for the multi-release upgrade? Are there any folks that might want to share their expertise? Maybe a process outline that worked? Best <span style="font-family:sans-serif">practices</span>? Do folks believe tools are the right way to solve this or would comprehensive upgrade documentation be better for the general community?</div><div><br></div></div></div></blockquote><div><div><br></div><div>What about packages - what repos will we set up on these nodes ... will they jump directly from current version to latest of target e.g. N+2? Is that possible - I mean we may have to consider any version specific packaging tasks. In TripleO we are actually using ansible tasks defined per service manifest e.g. neutron l3 agent @ [1] to stop all the things and then we rely on puppet (puppet-tripleo and service specific puppet modules) to update packages, run dbase migrations e.g. [2] and start all the things again (the exception to this general rule of ansible down/puppet up is some core services, which we want to recover immediately rather than wait for puppet run, like at [3] for example rabbit). </div><div><br></div><div>I am not by any stretch expert on the dbase migrations so I leave that discussion to more qualified folks but just from a general scaling point of view trying to maintain a single repo for all the migration things for all services doesn't work so +1 to the others here advocating the migrations live with the service and should be compiled/applied by tooling at run time - whether it is a container thing-a-me or puppet/whatever. For TripleO you could even override the puppet PostDeploy steps and run Ansible tasks instead if that accomplished what you needed for the upgrades in your service list. In fact the TripleO Ocata to Pike upgrade overrides those to run docker instead of puppet (puppet is still invoked however) to bring up your services in containers. </div><div><br></div><div>Besides the obviously crucial migrations there are other issues to consider. We've had to deal with changes to services themselves, deprecations for example removing foo-api.service and using apache for that service instead of eventlet. And then special case bugs like openvswitch - we had to special case ovs 2.4->2.5 for M..N and 2.5->2.6 for N..O to prevent it from restarting during - and killing - the upgrade). In today's workflow we would essentially need to combine these into one 'invocation 'of the upgrade but I really have not thought about that in any detail.<br></div><div><br></div><div>thanks for reading, marios</div><div><br></div><div>[0] <a href="https://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html#upgrading-the-overcloud-to-ocata-and-beyond">https://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html#upgrading-the-overcloud-to-ocata-and-beyond</a></div><div>[1] <a href="https://github.com/openstack/tripleo-heat-templates/blob/6f75d76d42203657a2b39af5269d2a8f586e93bc/puppet/services/neutron-l3.yaml#L87">https://github.com/openstack/tripleo-heat-templates/blob/6f75d76d42203657a2b39af5269d2a8f586e93bc/puppet/services/neutron-l3.yaml#L87</a></div><div>[2] <a href="https://github.com/openstack/puppet-neutron/blob/adaee02815771f5d89975212b8cea24b68750618/manifests/db/sync.pp#L27">https://github.com/openstack/puppet-neutron/blob/adaee02815771f5d89975212b8cea24b68750618/manifests/db/sync.pp#L27</a></div><div>[3] <a href="https://github.com/openstack/tripleo-heat-templates/blob/6f75d76d42203657a2b39af5269d2a8f586e93bc/puppet/services/rabbitmq.yaml#L110">https://github.com/openstack/tripleo-heat-templates/blob/6f75d76d42203657a2b39af5269d2a8f586e93bc/puppet/services/rabbitmq.yaml#L110</a></div><div><br></div></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div>As most of the upgrade issues center around database migrations, we discussed some of the potential pitfalls at length. One approach was to roll-up all DB migrations into a single repository and run all upgrades for a given project in one step. Another was to simply have mutliple python virtual environments and just run in-line migrations from a version specific venv (this is what the OSA tooling does). Does one way work better than the other? Any thoughts on how this could be better? Would having N+2/3 migrations addressable within the projects, even if they're not tested any longer, be helpful?</div><div><br></div><div>It was our general thought that folks would be interested in having the ability to skip releases so we'd like to hear from the community to validate our thinking. Additionally, we'd like to get more minds together and see if folks are wanting to work on such an initiative, even if this turns into nothing more than a co-op/channel where we can "phone a friend". Would it be good to try and secure some PTG space to work on this? Should we try and create working group going? </div><div></div></div></div></blockquote><div><br></div><div><div><br></div><div><br></div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div> <br></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div></div><div>If you've made it this far, please forgive my stream of consciousness. I'm trying to ask a lot of questions and distill long form conversation(s) into as little text as possible all without writing a novel. With that said, I hope this finds you well, I look forward to hearing from (and working with) you soon.</div><div><br></div><div>[0] <a href="https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading" target="_blank">https://etherpad.openstack.org<wbr>/p/BOS-forum-skip-level-upgrad<wbr>ing</a></div><div>[1] <a href="https://github.com/openstack/openstack-ansible-ops/tree/master/leap-upgrades" target="_blank">https://github.com/openstack/o<wbr>penstack-ansible-ops/tree/mast<wbr>er/leap-upgrades</a></div><div><br></div><div><br></div><div>--</div><div><br></div><div>Kevin Carter</div><div>IRC: Cloudnull</div><div>
</div></div></div>
<br>______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>
<br></blockquote></div><br></div></div>