Just in case I wasn't saying anything about how legit or widespread this use case is, I was just providing an example of how rebuild without real rebuild could be leveraged by operators. Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host or smth, so that non-admins could not pick any arbitrary compute and had to rely on scheduler only. пт, 17 мар. 2023 г., 05:50 Mohammed Naser <mnaser@vexxhost.com>:
IMHO, 0.001% of the time someone might be running rebuild to do something that’s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it’s a user expecting a fresh VM
Get Outlook for iOS <https://aka.ms/o0ukef> ------------------------------ *From:* Sylvain Bauza <sylvain.bauza@gmail.com> *Sent:* Thursday, March 16, 2023 10:21:14 AM *To:* Dmitriy Rabotyagov <noonedeadpunk@gmail.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [nova][cinder] future of rebuild without reimaging
Le jeu. 16 mars 2023 à 13:38, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?
I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own.
We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor)
ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?
I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.
Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.
I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.
What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?
I'm not opposed to challenge the usecases in a spec, for sure.
I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:
"This operation recreates the root disk of the server."
That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.
I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.
--Dan
[0]
https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...