[tripleo][update][blueprint] Update refactor: more feedback, more control, more speed.
Sofer Athlan-Guyot
sathlang at redhat.com
Thu Jul 2 11:20:48 UTC 2020
Hi,
hope you liked the title, I find it catchy.
Update is mainly an afterthought that needs to work. So we mainly fix
"stuff" there. No major change happened there since a long time.
Following the PTG, I'm proposing a new blueprint and a bug:
1. Refactor tripleo update to offer the user more feedback and
control[1].
2. Registering node and repos can happen after some module check for
packages[2].
I'm pretty new to this so I would need feedback about the form and
content. For instance, point 2. could be a blueprint instead of a bug,
tell me what you think.
1. refactor update step to load step playbook instead of looping over
the steps:
- this will speed up update (no more skipped tasks)
- this will offer point of recovery when the update fails
(by doing something like in named debug[3] for deployment)
2. refactor/fix? host-prep-tasks to include two steps:
- step0 to add pre-update in-flight validation to the update
process and rhosp registration;
- step1 to all other tasks;
- make sure it run in parallel on all nodes
Point 1. would be a catch up with deployment. It offers speed
improvement as we wouldn't skip tasks anymore. We could notify the user
of what we are doing: "I'm removing the node from the cluster" instead
of "step1". It would offer the user the hook to be able to restart a
failed update from any step. Overall a big win, I think.
Point 2. is newer, I filled it as a bug because I bumped into it as an
issue when trying to add validation for subscription. It opens some
possibilities for the update:
- in-flight validation at the beginning of the update process that
would be skipped during deployment using tag
- using tags we could also run specific day 2 action outside of the
update window:
openstack overcloud update run --tags 'pre-update-validation' (with
pre-update-validation in host-prep-tasks step0)
openstack overcloud update run --tags 'rhsm-subscription'
Well, it looked promising to me.
Now, tell me what you think, but please, be nice, I'm old and
susceptible.
I have more coming, sorted by order of though I put into it, starting
with the ones I though about more:
- Check if we need a reboot of the server and notify the user.
- Gain some more speed and clarity by having a
running-on-all-host-in-parallel-host-update-prep-tasks new step. For
instance all HA image tagging magic could go in there.
- Investigate converge and check if we still could not further optimize
it for update.
I would like to gain more experience with the process before I filled
those new blueprints.
I'm going to draft a spec for the proposed blueprint and then I'll push
some WIP code.
Thanks,
[1] https://blueprints.launchpad.net/tripleo/+spec/tripleo-update-smart-steps
[2] https://bugs.launchpad.net/tripleo/+bug/1886028
[1] https://review.opendev.org/#/c/636731/
--
Sofer Athlan-Guyot
chem on #irc
DFG:Upgrades
More information about the openstack-discuss
mailing list