Re: [nova][cinder] future of rebuild without reimaging

21 Mar 2023

      ...
Basically they have an additional and unusual compute host recovery
process, where a compute host after a failure is brought back by the
same name. Then they rebuild the servers on the same compute host
where the servers were running before. When the server's disk was
backed by a volume, so its content was not lost by the compute host
failure, they don't want to lose it either in the recovery process.
The evacute operation clearly would be a better fit to do this, but
that disallows evacuating to the "same" host. For a long time rebuild
just allowed "evacuating to the same host". So they went with it.
Aside from the "should this be possible" question, is rebuild even
required in this case? For the non-volume-backed instances, we need
rebuild to re-download the image and create the root disk. If it's
really required for volume-backed instances, I'm guessing there's just
some trivial amount of state that isn't in place on recovery that the
rebuild "solves". It is indeed a very odd fringe use-case that is an
obvious mis-use of the function.
...
At the moment I did not find a prohibition in the documentation to
bring back a failed compute host by the same name. If I missed it or
this is not recommended for any reason, please let me know.
I'm not sure why this would be specifically documented, but since
compute nodes are not fully stateless, your scenario is basically
"delete part of the state of the system and expect things to keep
working" which I don't think is reasonable (nor something we should need
to document).

Your scenario is basically the same as one where your /var/lib/nova is
mounted on a disk that doesn't come up after reboot, or on NFS that was
unavailable at boot. If nova were to say "meh, a bunch of state
disappeared, I must be a rebuilt compute host" then it would potentially
destroy (or desynchronize) actual state in other nodes (i.e. the
database) for a transient/accidental situation. TBH, we might should
even explicitly *block* rebuild on an instance that appears to be
missing its on-disk state to avoid users, who don't know the state of the
infra, from doing this to try to unblock their instances while ops are
doing maintenance.

I will point out that bringing back a compute node under the same name
(without cleaning the residue first) is strikingly similar to renaming a
compute host, which we do *not* support. As of Antelope, the compute
node would detect your scenario as a potential rename and refuse to
start, again because of state that has been lost in the system. So just
FYI that an actual blocker to your scenario is coming :)
...
Clearly in many clouds evacuating can fully replace what they do here.
I believe they may have chosen this unusual compute host recovery
option to have some kind of recovery process for very small
deployments, where you don't always have space to evacuate before you
rebuilt the failed compute host. And this collided with a deployment
system which reuses host names.
At this point I'm not sure if this really belongs to the rebuild
operation. Could easily be better addressed in evacuate. Or in the
deployment system not reusing hostnames.
Evacuate can't work for this case either because it requires the compute
node to be down to perform. As you note, bringing it back under a
different name would solve that problem. However, neither "evacuate to
same host" or "use rebuild for this recovery procedure" are reasonable,
IMHO.

--Dan

Re: [nova][cinder] future of rebuild without reimaging

Dan Smith