Hello Andrew!
I wonder if it has to do with how the volumes/storage are set up with your instance. If the instance is not an ephemeral instance, I think the behavior you describe is what you could see. It looks like according to this document (https://docs.openstack.org/nova/latest/contributor/evacuate-vs-rebuild.html#high-level) on evacuation vs rebuilding, volume-backed instances don't actually rebuild with different images. If I've misread that, I welcome the correction.
I don't think I'm using volume-backed instances, even though I'm not 100% sure what that is, but I also think you're on the right track here: I think I found the cause of my problem!
I have a backup server that is running out of nova's view; it creates rbd snapshots to use for incremental backups, and the presence of those snapshots seems to be blocking rebuild.
When I purge snaps before rebuilding, everything works as expected. I'm not sure why there's nothing in the logs when the rebuild fails, and the failure behavior is a bit goofy but could be worse.
Thank you for your thoughts!
-Andrew
What does the storage configuration for your virtual machines (that is, boot disk, volumes, images, etc.) look like? According to the API docs (https://docs.openstack.org/api-ref/compute/#rebuild-server-rebuild-action) you do get different behaviors if the instance is volume-backed or not.
Thanks!
Jadon
On Tue, Jul 16, 2024 at 2:46 PM Andrew Bogott <abogott@wikimedia.org> wrote:
I'm puzzled by nova instance rebuilds.
When I run 'openstack server rebuild --image <whatever>' on a new test
VM, it does what I expect:
- throws away the old server image
- boots up a brand new server using the requested image with the old
user/vendor data
- attaches new server to the old network port and cinder volumes.
That's just what I want! Alas, when I try this with the actual old VMs
in need of update, usually it does a totally different thing:
- reboots the server
- changes the nova db record to reflect the requested VM (so that the
server /looks/ like it was rebuilt in 'openstack server show')
- That's it!
It absolutely does not rebuild the actual server disk image. So, for
instance, any files that were present in '/' of the old server are still
present post-rebuild. lsb_release still shows the OS of the original
base image, disagreeing with the the OS version displayed by 'openstack
server show'.
I've been reading logfiles for days but I don't see nova ever complain
(in either case) that it's doing something unexpected and in both cases
the servers wind up in state 'Active'.
Am I misunderstanding what this is supposed to do?
If not, does anyone have thoughts about what might be going wrong? I
assume nova is encountering a roadbump during the rebuild and doing a
graceful revert, but if so it sure isn't telling me.
Bobcat, kvm, VM images are hosted on ceph/rbd.