That's great to hear you got it working! Something with the snapshots blocking the behavior sounds reasonable, especially if it's something Nova doesn't know about. Thank you for sharing back what you found! Thanks! Jadon On Wed, Jul 17, 2024 at 12:51 PM Andrew Bogott <abogott@wikimedia.org> wrote:
On 7/17/24 11:45 AM, Jadon Naas wrote:
Hello Andrew!
I wonder if it has to do with how the volumes/storage are set up with your instance. If the instance is not an ephemeral instance, I think the behavior you describe is what you could see. It looks like according to this document ( https://docs.openstack.org/nova/latest/contributor/evacuate-vs-rebuild.html#...) on evacuation vs rebuilding, volume-backed instances don't actually rebuild with different images. If I've misread that, I welcome the correction.
I don't think I'm using volume-backed instances, even though I'm not 100% sure what that is, but I also think you're on the right track here: I think I found the cause of my problem!
I have a backup server that is running out of nova's view; it creates rbd snapshots to use for incremental backups, and the presence of those snapshots seems to be blocking rebuild.
When I purge snaps before rebuilding, everything works as expected. I'm not sure why there's nothing in the logs when the rebuild fails, and the failure behavior is a bit goofy but could be worse.
Thank you for your thoughts!
-Andrew
What does the storage configuration for your virtual machines (that is, boot disk, volumes, images, etc.) look like? According to the API docs ( https://docs.openstack.org/api-ref/compute/#rebuild-server-rebuild-action) you do get different behaviors if the instance is volume-backed or not.
Thanks!
Jadon
On Tue, Jul 16, 2024 at 2:46 PM Andrew Bogott <abogott@wikimedia.org> wrote:
I'm puzzled by nova instance rebuilds.
When I run 'openstack server rebuild --image <whatever>' on a new test VM, it does what I expect:
- throws away the old server image
- boots up a brand new server using the requested image with the old user/vendor data
- attaches new server to the old network port and cinder volumes.
That's just what I want! Alas, when I try this with the actual old VMs in need of update, usually it does a totally different thing:
- reboots the server
- changes the nova db record to reflect the requested VM (so that the server /looks/ like it was rebuilt in 'openstack server show')
- That's it!
It absolutely does not rebuild the actual server disk image. So, for instance, any files that were present in '/' of the old server are still present post-rebuild. lsb_release still shows the OS of the original base image, disagreeing with the the OS version displayed by 'openstack server show'.
I've been reading logfiles for days but I don't see nova ever complain (in either case) that it's doing something unexpected and in both cases the servers wind up in state 'Active'.
Am I misunderstanding what this is supposed to do?
If not, does anyone have thoughts about what might be going wrong? I assume nova is encountering a roadbump during the rebuild and doing a graceful revert, but if so it sure isn't telling me.
Bobcat, kvm, VM images are hosted on ceph/rbd.