[tripleo][update][blueprint] Update refactor: more feedback, more control, more speed.
Hi, hope you liked the title, I find it catchy. Update is mainly an afterthought that needs to work. So we mainly fix "stuff" there. No major change happened there since a long time. Following the PTG, I'm proposing a new blueprint and a bug: 1. Refactor tripleo update to offer the user more feedback and control[1]. 2. Registering node and repos can happen after some module check for packages[2]. I'm pretty new to this so I would need feedback about the form and content. For instance, point 2. could be a blueprint instead of a bug, tell me what you think. 1. refactor update step to load step playbook instead of looping over the steps: - this will speed up update (no more skipped tasks) - this will offer point of recovery when the update fails (by doing something like in named debug[3] for deployment) 2. refactor/fix? host-prep-tasks to include two steps: - step0 to add pre-update in-flight validation to the update process and rhosp registration; - step1 to all other tasks; - make sure it run in parallel on all nodes Point 1. would be a catch up with deployment. It offers speed improvement as we wouldn't skip tasks anymore. We could notify the user of what we are doing: "I'm removing the node from the cluster" instead of "step1". It would offer the user the hook to be able to restart a failed update from any step. Overall a big win, I think. Point 2. is newer, I filled it as a bug because I bumped into it as an issue when trying to add validation for subscription. It opens some possibilities for the update: - in-flight validation at the beginning of the update process that would be skipped during deployment using tag - using tags we could also run specific day 2 action outside of the update window: openstack overcloud update run --tags 'pre-update-validation' (with pre-update-validation in host-prep-tasks step0) openstack overcloud update run --tags 'rhsm-subscription' Well, it looked promising to me. Now, tell me what you think, but please, be nice, I'm old and susceptible. I have more coming, sorted by order of though I put into it, starting with the ones I though about more: - Check if we need a reboot of the server and notify the user. - Gain some more speed and clarity by having a running-on-all-host-in-parallel-host-update-prep-tasks new step. For instance all HA image tagging magic could go in there. - Investigate converge and check if we still could not further optimize it for update. I would like to gain more experience with the process before I filled those new blueprints. I'm going to draft a spec for the proposed blueprint and then I'll push some WIP code. Thanks, [1] https://blueprints.launchpad.net/tripleo/+spec/tripleo-update-smart-steps [2] https://bugs.launchpad.net/tripleo/+bug/1886028 [1] https://review.opendev.org/#/c/636731/ -- Sofer Athlan-Guyot chem on #irc DFG:Upgrades
participants (1)
-
Sofer Athlan-Guyot