[Openstack-operators] [skip-level-upgrades][fast-forward-upgrades] PTG summary
Lee Yarwood
lyarwood at redhat.com
Wed Sep 20 13:29:12 UTC 2017
My thanks again to everyone who attended and contributed to the
skip-level upgrades track over the first two days of last weeks PTG.
I've included a short summary of our discussions below with a list of
agreed actions for Queens at the end.
tl;dr s/skip-level/fast-forward/g
https://etherpad.openstack.org/p/queens-PTG-skip-level-upgrades
Monday - Define and rename
--------------------------
During our first session [1] we briefly discussed the history of the
skip-level upgrades effort within the community and the various
misunderstandings that have arisen from previous conversations around
this topic at past events.
We agreed that at present the only way to perform upgrades between N and
N+>=2 releases of OpenStack was to upgrade linearly through each major
release, without skipping between the starting and target release of the
upgrade.
This is contrary to previous discussions on the topic where it had been
suggested that releases could be skipped if DB migrations for these
releases were applied in bulk later in the process. As projects within
the community currently offer no such support for this it was agreed to
continue to use the supported N to N+1 upgrade jumps, albeit in a
minimal, offline way.
The name skip-level upgrades has had an obvious role to play in the
confusion here and as such the renaming of this effort was discussed at
length. Various suggestions are listed on the pad but for the time being
I'm going to stick with the basic `fast-forward upgrades` name (FFOU,
OFF, BOFF, FFUD etc were all close behind). This removes any notion of
releases being skipped and should hopefully avoid any further confusion
in the future.
Support by the projects for offline upgrades was then discussed with a
recent Ironic issue [2] highlighted as an example where projects have
required services to run before the upgrade could be considered
complete. The additional requirement of ensuring both workloads and the
data plane remain active during the upgrade was also then discussed. It
was agreed that both the `supports-upgrades` [3] and
`supports-accessible-upgrades` [4] tags should be updated to reflect
these requirements for fast-forward upgrades.
Given the above it was agreed that this new definition of what
fast-forward upgrades are and the best practices associated with them
should be clearly documented somewhere. Various operators in the room
highlighted that they would like to see a high level document outline
the steps required to achieve this, hopefully written by someone with
past experience of running this type of upgrade.
I failed to capture the names of the individuals who were interested in
helping out here. If anyone is interested in helping out here please
feel free to add your name to the actions either at the end of this mail
or at the bottom of the pad.
In the afternoon we reviewed the current efforts within the community to
implement fast-forward upgrades, covering TripleO, Charms (Juju) and
openstack-ansible. While this was insightful to many in the room there
didn't appear to be any obvious areas of collaboration outside of
sharing best practice and defining the high level flow of a fast-forward
upgrade.
Tuesday - NFV, SIG and actions
------------------------------
Tuesday started with a discussion around NFV considerations with
fast-forward upgrades. These ranged from the previously mentioned need
for the data plane to remain active during the upgrade to the restricted
nature of upgrades in NFV environments in terms of time and number of
reboots.
It was highlighted that there are some serious as yet unresolved bugs in
Nova regarding the live migration of instances using SR-IOV devices.
This currently makes the moving of workloads either prior to or during
the upgrade particularly difficult.
Rollbacks were also discussed and the need for any best practice
documentation around fast-forward upgrades to include steps to allow the
recovery of environments if things fail was also highlighted.
We then revisited an idea from the first day of finding or creating a
SIG for this effort to call home. It was highlighted that there was a
suggestion in the packaging room to create a Deployment / Lifecycle SIG.
After speaking with a few individuals later in the week I've taken the
action to reach out on the openstack-sigs mailing list for further
input.
Finally, during a brief discussion on ways we could collaborate and share
tooling for fast-forward upgrades a new tool to migrate configuration
files between N to N+>=2 releases was introduced [5]. While interesting
it was seen as a more generic utility that could also be used between N
to N+1 upgrades. AFAIK the authors joined the Oslo room shortly after
this session ended to gain more feedback from that team.
Actions
-------
- Modify the `supports-upgrades`[3] and `supports-accessible-upgrades`[4] tags
I have yet to look into the formal process around making changes to
these tags but I will aim to make a start ASAP.
- Find an Ops lead for the documentation effort
I failed to take down the names of some of the operators who were
talking this through at the time. If they or anyone else is still
interested in helping here please let me know!
- Find or create a relevant SIG for this effort
As discussed above this could be as part of the lifecycle SIG or an
independent upgrades SIG. Expect a separate mail to the SIG list
regarding this shortly.
- Identify a room chair for Sydney
Unfortunately I will not be present in Sydney to lead a similar
session. If anyone is interested in helping please feel free to respond
here or reach out to me directly!
My thanks again to everyone who attended the track, I had a blast
leading the room and hope that the attendees found both the track and
some of the outcomes listed above useful.
Cheers,
Lee
[1] https://twitter.com/lyarwood_/status/907310970229415937
[2] https://review.openstack.org/#/q/topic:ironic-offline-migration
[3] https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html
[4] https://governance.openstack.org/tc/reference/tags/assert_supports-accessible-upgrade.html
[5] https://github.com/NguyenHoaiNam/Jump-Over-Release/blob/test_dynamic_section/README.md
--
Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76
More information about the OpenStack-operators
mailing list