[nova][cinder] future of rebuild without reimaging

Bence Romsics

15 Mar 2023 15 Mar '23

5:39 a.m.

Hi All! We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future). As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging. However since the implementation of https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volum... rebuild without reimaging is only possible using an old microversion (<2.93). With that change merged, rebuild without reimaging seems to be a somewhat less than fully supported feature. A few examples of what I mean by that: First, there's this warning: https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1ce... In which it is unclear to me what exactly will become an error in a future release. Rebuild with a different image? Or any rebuild with microversion <2.93? Then old nova microversions may get dropped. Though what I heard from nova folks, this is unlikely to happen. Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior. What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)? If the topic is worth further discussion beyond the ML, I can also bring it to the nova ptg. Thanks in advance, Bence Romsics (rubasov) ps: I'll be afk for a few days, but I'll follow up next Tuesday.

Show replies by date

Sylvain Bauza

15 Mar 15 Mar

11:28 a.m.

Le mer. 15 mars 2023 à 13:45, Bence Romsics <bence.romsics@gmail.com> a écrit :

...

Hi All!

We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging. However since the implementation of

https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volum... rebuild without reimaging is only possible using an old microversion (<2.93). With that change merged, rebuild without reimaging seems to be a somewhat less than fully supported feature. A few examples of what I mean by that:

That's not really true : the new microversion just means we change the default behaviour, but you can still opt into the previous behaviour by requesting an older microversion. That being said, I do understand your concerns, further below.

...

First, there's this warning:

https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1ce...

In which it is unclear to me what exactly will become an error in a future release. Rebuild with a different image? Or any rebuild with microversion <2.93?

The latter (in theory) : if you opt into a microversion older or equal than 2.93, you shouldn't expect your volume to *not* be rebuilt. Then old nova microversions may get dropped. Though what I heard from

...

nova folks, this is unlikely to happen.

Correct, I never want to say never, but we don't have any plans in any subsequent futures to bump the minimum versions, for many many reasons, not only due to the tech debt but also and mainly because of the interoperatibility we must guarantee.

...

Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

...

If the topic is worth further discussion beyond the ML, I can also bring it to the nova ptg.

That's already the case. Add yourself to the courtesy ping list of that topic. https://etherpad.opendev.org/p/nova-bobcat-ptg#L152 -Sylvain

...

Thanks in advance, Bence Romsics (rubasov)

ps: I'll be afk for a few days, but I'll follow up next Tuesday.

Dan Smith

11:54 a.m.

...

We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it? I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that. Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...

As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...

Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly: "This operation recreates the root disk of the server." That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature. I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me. --Dan [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

Dmitriy Rabotyagov

16 Mar 16 Mar

5:35 a.m.

...

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own. ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:

...

...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

Sean Mooney

7:03 a.m.

On Thu, 2023-03-16 at 13:35 +0100, Dmitriy Rabotyagov wrote:

...

...
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own.

rebuild is __not__ a move operation the curernt special case is a hack to alow image metadata properties to be updated for an exsitng vm but it will not reschedule the vm to another host. we talks about this in paste ptg where i propsoed adding a recreate api. i do not think we should ever make rebuilt a move oepratation but we could supprot a new api to enable the vm to recreate (keeping its data) on a new host with updated flavor/image extra specs based on teh current value of either. i really wish we coudl remvoe the current rebuild beahvior but when we discussed doing that before we decied it woudl break to many people.

...

ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:

...
...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

Sylvain Bauza

7:21 a.m.

Le jeu. 16 mars 2023 à 13:38, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...

...
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own.

We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor) ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:

...

...
...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0]

https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

...

Mohammed Naser

9:50 p.m.

IMHO, 0.001% of the time someone might be running rebuild to do something that’s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it’s a user expecting a fresh VM Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Sylvain Bauza <sylvain.bauza@gmail.com> Sent: Thursday, March 16, 2023 10:21:14 AM To: Dmitriy Rabotyagov <noonedeadpunk@gmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [nova][cinder] future of rebuild without reimaging Le jeu. 16 mars 2023 à 13:38, Dmitriy Rabotyagov <noonedeadpunk@gmail.com<mailto:noonedeadpunk@gmail.com>> a écrit :

...

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own. We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor) ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com<mailto:dms@danplanet.com>>:

...

...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

Dmitriy Rabotyagov

17 Mar 17 Mar

1:04 a.m.

Just in case I wasn't saying anything about how legit or widespread this use case is, I was just providing an example of how rebuild without real rebuild could be leveraged by operators. Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host or smth, so that non-admins could not pick any arbitrary compute and had to rely on scheduler only. пт, 17 мар. 2023 г., 05:50 Mohammed Naser <mnaser@vexxhost.com>:

...

IMHO, 0.001% of the time someone might be running rebuild to do something that’s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it’s a user expecting a fresh VM

Get Outlook for iOS <https://aka.ms/o0ukef> ------------------------------ *From:* Sylvain Bauza <sylvain.bauza@gmail.com> *Sent:* Thursday, March 16, 2023 10:21:14 AM *To:* Dmitriy Rabotyagov <noonedeadpunk@gmail.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [nova][cinder] future of rebuild without reimaging

Le jeu. 16 mars 2023 à 13:38, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own.

We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor)

ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:

...
...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0]

https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

...

Sylvain Bauza

1:52 a.m.

Le ven. 17 mars 2023 à 09:10, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...

Just in case I wasn't saying anything about how legit or widespread this use case is, I was just providing an example of how rebuild without real rebuild could be leveraged by operators.

Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host or smth, so that non-admins could not pick any arbitrary compute and had to rely on scheduler only.

Ah, I see your point, I'll add it for the vPTG agenda. -Sylvain пт, 17 мар. 2023 г., 05:50 Mohammed Naser <mnaser@vexxhost.com>:

...

IMHO, 0.001% of the time someone might be running rebuild to do something that’s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it’s a user expecting a fresh VM

Get Outlook for iOS <https://aka.ms/o0ukef> ------------------------------ *From:* Sylvain Bauza <sylvain.bauza@gmail.com> *Sent:* Thursday, March 16, 2023 10:21:14 AM *To:* Dmitriy Rabotyagov <noonedeadpunk@gmail.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [nova][cinder] future of rebuild without reimaging

Le jeu. 16 mars 2023 à 13:38, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...
Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own.

We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor)

ср, 15 мар. 2023 г. в 19:57, Dan Smith <dms@danplanet.com>:

...
...
We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user.

...
As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other.

...
Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)?

I'm not opposed to challenge the usecases in a spec, for sure.

I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature.

I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me.

--Dan

[0]

https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-...

...

Bence Romsics

21 Mar 21 Mar

7:05 a.m.

Hi, Thanks for all the answers! I went back to ask what our users are using this for. At the moment I'm not sure what they do is really supported. But you tell me. To me it makes some sense. Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it. At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know. Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names. At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames. Please let me know what you think! Thanks in advance, Bence

Dan Smith

7:56 a.m.

...

Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case? For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk. If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function.

...

At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document). Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance. I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...

Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO. --Dan

Dmitriy Rabotyagov

6 Apr 6 Apr

2:19 a.m.

I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. Given you will reset state to active - you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB. I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well. вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...

...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case? For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk. If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function.

...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

Sean Mooney

3:56 a.m.

On Thu, 2023-04-06 at 11:19 +0200, Dmitriy Rabotyagov wrote:

...

I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. rebuild would not eb corrct to use there.

...

Given you will reset state to active

that is not safe to do. the correct fix would be reset it to shevle_offloaded which you currntly would have to do in the db.

...

- you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB.

I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well. well we have talked about allowing reset-state reset to other states in the past or allowing evacuate to work. i proablywoul dnot allow the recreate api to work in that broken state.

the recreate api was not intended for error recovery. it was intented to fullfile two usercases 1.) unify rebuild and resize so you can do either or both from a singel api call. 2.) update your vm so that it gets the latest flavor extra_specs and image properies appleis witout data lose.

...

вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...
...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case?

if your vm is boot form voluem or you are using the ceph image backend for nova or nova on nfs then i think all that is requried is hard reboot. there are no port updates/bindings and hard reboot both plugs the netowrk interface into ovs or whatever the backend is on the host but also invokes os-brick to do the same for the volumes. so its not clear to my why rebuild woudl be required in a shared storage case.

...

...
For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk.

yes although when you had the hardware failure you could have used evacuate to rebuild the vm on another host. if you could not do that because the vm was pinned to that host then the existing rebuild command is sufficent. if the failure was a motherboard or simialr and the data on disk was not lost then a hard reboot should also be enough for vms with local storage. rebuild would only be required if the data was lost.

...
If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function. ya if hard reboot/power on is not enough i think there is a trival bug there. we are obviouly missing somethign that should be done. power_on/hard reboot are intended to be abel to recreate the vm with its data after the host had been power off and powered on again. so it is ment to do everything required to be able to start the instance. nova has all the info in its database to do that without needing to call the other service like cinder and neutorn.

it woudl be good to know what actully fails if you just do hard reboot and capature that in a bug report.

...

...
...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

Dmitriy Rabotyagov

4:17 a.m.

To be frank I wasn't trying to do hard reboot in that case .Til that it was unobvious for me that it should work or be used at all in such weird scenarios. Like docs say nothing about how hard reboot can be leveraged [1]. So my impression was that hard reboot simply triggers `virsh destroy; virsh create` in the contrary to `virsh reboot` for soft one and that's kind of it, rather than taking care of re-wiring everything. My biggest problem with DB updates, is that set of users who can execute admin commands towards openstack APIs != set of users who have root access to the hardware or database. So while indeed recovery from such failures could be escalated, at least verifying if the issue is intermittent or not is kinda helpful as well. So having a way to recover from such states via API would be an improvement. [1] https://docs.openstack.org/nova/latest/user/reboot.html чт, 6 апр. 2023 г. в 12:56, Sean Mooney <smooney@redhat.com>:

...

On Thu, 2023-04-06 at 11:19 +0200, Dmitriy Rabotyagov wrote:

...
I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. rebuild would not eb corrct to use there.

...
Given you will reset state to active

that is not safe to do. the correct fix would be reset it to shevle_offloaded which you currntly would have to do in the db.

...
- you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB.

I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well. well we have talked about allowing reset-state reset to other states in the past or allowing evacuate to work. i proablywoul dnot allow the recreate api to work in that broken state.

the recreate api was not intended for error recovery. it was intented to fullfile two usercases 1.) unify rebuild and resize so you can do either or both from a singel api call. 2.) update your vm so that it gets the latest flavor extra_specs and image properies appleis witout data lose.

...
вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...
...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case?

if your vm is boot form voluem or you are using the ceph image backend for nova or nova on nfs then i think all that is requried is hard reboot. there are no port updates/bindings and hard reboot both plugs the netowrk interface into ovs or whatever the backend is on the host but also invokes os-brick to do the same for the volumes. so its not clear to my why rebuild woudl be required in a shared storage case.

...
...
For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk.

yes although when you had the hardware failure you could have used evacuate to rebuild the vm on another host. if you could not do that because the vm was pinned to that host then the existing rebuild command is sufficent. if the failure was a motherboard or simialr and the data on disk was not lost then a hard reboot should also be enough for vms with local storage. rebuild would only be required if the data was lost.

...
If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function. ya if hard reboot/power on is not enough i think there is a trival bug there. we are obviouly missing somethign that should be done. power_on/hard reboot are intended to be abel to recreate the vm with its data after the host had been power off and powered on again. so it is ment to do everything required to be able to start the instance. nova has all the info in its database to do that without needing to call the other service like cinder and neutorn.

it woudl be good to know what actully fails if you just do hard reboot and capature that in a bug report.

...
...
...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

Sean Mooney

5:01 a.m.

On Thu, 2023-04-06 at 13:17 +0200, Dmitriy Rabotyagov wrote:

...

To be frank I wasn't trying to do hard reboot in that case .Til that it was unobvious for me that it should work or be used at all in such weird scenarios. Like docs say nothing about how hard reboot can be leveraged [1]. So my impression was that hard reboot simply triggers `virsh destroy; virsh create` in the contrary to `virsh reboot` for soft one and that's kind of it, rather than taking care of re-wiring everything.

hard reboot is the primary way to recover a vm form error where that error state is not related to a live migration failure. it can be used in the case of live migraiton if you first check the host and node field on the instance point at the host where the vm is. this is why hard reboot support a vm state of ERROR https://docs.openstack.org/api-ref/compute/?expanded=reboot-server-reboot-ac... almost all other instance operator will not allow a vm to be in ERROR state. but in general becasue server start is optional in the api and hard reboot is required hard reboot is the primary way to start a vm after a host reboot. https://docs.openstack.org/nova/latest/user/support-matrix.html so regardless of the usecase of recoving form error by regenerating the domain, repluggin the port and remounting the volumes it also need to support the same for the reboot case as linux bridge for example would not preserv the network config across reboots and host mounted volumes (iscsi) simialrly gets reset on a host reboot. so ya the real differce betwen soft reboot and hard reboot with the libvirt dirver is soft reboot only restates the os in the qemu porcess and hard reboot recreates everything including host configurtion using the data in the nova db. im not really sure how to make that more obvious in the api docs. we proably could add a note about using hardreboot not rebuild in such cases.

...

My biggest problem with DB updates, is that set of users who can execute admin commands towards openstack APIs != set of users who have root access to the hardware or database. So while indeed recovery from such failures could be escalated, at least verifying if the issue is intermittent or not is kinda helpful as well. So having a way to recover from such states via API would be an improvement.

[1] https://docs.openstack.org/nova/latest/user/reboot.html

чт, 6 апр. 2023 г. в 12:56, Sean Mooney <smooney@redhat.com>:

...
On Thu, 2023-04-06 at 11:19 +0200, Dmitriy Rabotyagov wrote:

...
I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. rebuild would not eb corrct to use there.

...
Given you will reset state to active

that is not safe to do. the correct fix would be reset it to shevle_offloaded which you currntly would have to do in the db.

...
- you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB.

I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well. well we have talked about allowing reset-state reset to other states in the past or allowing evacuate to work. i proablywoul dnot allow the recreate api to work in that broken state.

the recreate api was not intended for error recovery. it was intented to fullfile two usercases 1.) unify rebuild and resize so you can do either or both from a singel api call. 2.) update your vm so that it gets the latest flavor extra_specs and image properies appleis witout data lose.

...
вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...
...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case?

if your vm is boot form voluem or you are using the ceph image backend for nova or nova on nfs then i think all that is requried is hard reboot. there are no port updates/bindings and hard reboot both plugs the netowrk interface into ovs or whatever the backend is on the host but also invokes os-brick to do the same for the volumes. so its not clear to my why rebuild woudl be required in a shared storage case.

...
...
For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk.

yes although when you had the hardware failure you could have used evacuate to rebuild the vm on another host. if you could not do that because the vm was pinned to that host then the existing rebuild command is sufficent. if the failure was a motherboard or simialr and the data on disk was not lost then a hard reboot should also be enough for vms with local storage. rebuild would only be required if the data was lost.

...
If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function. ya if hard reboot/power on is not enough i think there is a trival bug there. we are obviouly missing somethign that should be done. power_on/hard reboot are intended to be abel to recreate the vm with its data after the host had been power off and powered on again. so it is ment to do everything required to be able to start the instance. nova has all the info in its database to do that without needing to call the other service like cinder and neutorn.

it woudl be good to know what actully fails if you just do hard reboot and capature that in a bug report.

...
...
...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

Sylvain Bauza

4:28 a.m.

Le jeu. 6 avr. 2023 à 11:25, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...

I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. Given you will reset state to active - you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB.

I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well.

FWIW, we agreed on the vPTG to add another separate policy for cold-migrate (when the 'host' parameter is provided), so you could just modify the existing cold-migrate policy (without the 'host' parameter by default) to be able to be called by an end-user. If so, an user could ask to move their instances and restart them by this, and if they see some problem, they could then revert the migration. I should be working on it by the next weeks hopefully. -Sylvain вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...

...
...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case? For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk. If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function.

...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

Sean Mooney

4:47 a.m.

On Thu, 2023-04-06 at 13:28 +0200, Sylvain Bauza wrote:

...

Le jeu. 6 avr. 2023 à 11:25, Dmitriy Rabotyagov <noonedeadpunk@gmail.com> a écrit :

...
I think I just came up with another "usecase" or better say missing functionality. So in case VM is stuck in `unshelving` state, for example due to messaging issues or smth, there's no clean way of recovering VM from this state. Given you will reset state to active - you won't be able to execute `stop` since VM is not assigned to any compute (and fail with "instance not ready"), as it was shelved. So then rebuild could be used, since it will pass VM to be assigned to some host as a result. Another way around would be of course updating the database, setting VM back to `shelved_offloaded` and trying to unshelve again, but I hate messing up with DB.

I think this kinda brings me back to Sean's point of having an API call to re-create a VM while keeping it's data, as that would cover such corner-cases as well.

FWIW, we agreed on the vPTG to add another separate policy for cold-migrate (when the 'host' parameter is provided), so you could just modify the existing cold-migrate policy (without the 'host' parameter by default) to be able to be called by an end-user. If so, an user could ask to move their instances and restart them by this, and if they see some problem, they could then revert the migration.

I should be working on it by the next weeks hopefully.

that would help as long as the vm is not stuck in an error state. i.e. it wont help in the stuck in unshelving case. but it would help in the noisy neighbour case potentially.

...

-Sylvain

вт, 21 мар. 2023 г. в 15:59, Dan Smith <dms@danplanet.com>:

...
...
...
Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even required in this case? For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk. If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function.

...
At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document).

Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance.

I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :)

...
Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO.

--Dan

922

Age (days ago)

944

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Bence Romsics
Dan Smith
Dmitriy Rabotyagov
Mohammed Naser
Sean Mooney
Sylvain Bauza
Sylvain Bauza

[nova][cinder] future of rebuild without reimaging

tags

participants (7)