Instance Freeze During Snapshot in OpenStack 2024.1 with Ceph Octopus
Hello, I hope this message finds you well. I am currently using OpenStack version 2024.1, with Ceph Octopus as the backend storage. The configuration is as follows: Glance is configured to use the "images" pool on Ceph. The virtual machine disks are stored on the "vms" pool. When I attempt to create a snapshot of a running instance, the instance appears to freeze for a few seconds, during which it loses network connectivity (e.g., I cannot ping it). This behavior is unexpected, as I understand that snapshot operations, especially with Ceph backend, should not significantly interrupt the instance's runtime. Could you please advise what might be causing this issue? Are there any known limitations, configurations, or best practices I should be aware of regarding snapshot behavior with Ceph in this version of OpenStack? Looking forward to your insights. Best regards,
On 21/07/2025 12:05, mohammad kokabi wrote:
Hello, I hope this message finds you well.
I am currently using OpenStack version 2024.1, with Ceph Octopus as the backend storage. The configuration is as follows:
Glance is configured to use the "images" pool on Ceph.
The virtual machine disks are stored on the "vms" pool.
When I attempt to create a snapshot of a running instance, the instance appears to freeze for a few seconds, during which it loses network connectivity (e.g., I cannot ping it). This behavior is unexpected, as I understand that snapshot operations, especially with Ceph backend, should not significantly interrupt the instance's runtime.
Could you please advise what might be causing this issue? Are there any known limitations, configurations, or best practices I should be aware of regarding snapshot behavior with Ceph in this version of OpenStack?
Looking forward to your insights.
if you have the qemu-guest agent deploy then noav will freeze the file styem to make sure you get a consitent snapshot and unfreeze it that might be the issue https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/quie... so if 'hw_qemu_guest_agent=yes' and 'hw_require_fsfreeze=yes' property is set on the image nova is required to try and use the qemu guest agent to do that. that can breifly interup the guest if its not prepareed for it and tries to do io. either image properly are set by default so this is an opt in behvior. its really intended to allow taking consistent snapshot when you have boot form volume guests with addtional cidner volumes. other then that yes snapshost shoudl have little to no impact on the runnign geuest in general. note that nova is allowed to entirly stop the vm. nova does not guarantee live snaphosts at the api level although in practice that very rarly required today. althougyh if you set https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... nova will alwasy do offline snapshots for libvirt instnaces. at an api level virt dirvers like ironic for example are still allowed to implement snapshot as an offline operations if they choose. that not what happening here im just pointing out that its not garenteed to be no impact it just typically is.
Best regards,
participants (2)
-
mohammad kokabi
-
Sean Mooney