Openstack Backup and Restore
Hi everyone, we make cloud region for our company. Using Openstack with Ceph (as Block Storage). We configured backup driver field as BackupDriverCeph in the cinder.conf file. Also created volume and backups pools in Ceph Rados. We have a few problems/questions. We have developed an API that regularly backs up VMs (instances) that work with volumes. The process works as follows: First, we use the create_server_image() method from the OpenStack API to create an image of the instance. This results in an image of the instance being automatically created in OpenStack with a Snapshot type (including metadata such as direct_url, stores, schema, etc.). Simultaneously, a snapshot is also automatically created using with the create_server() command. We then take backups of all volumes attached to the instance. In our restore API, we use the image created from the instance backup mentioned above. We change block_device_mapping field and add the IDs of the volume backups to the block_device_mapping field of the image and use the create_server(image_name, volume_from_backed_volume) method in the OpenStack API to create a new instance. Then new volumes are created from volume backups and attach new volumes in new restored instance. The problem we're encountering is that when using the create_server command, the image created from the instance backup is tied to the direct_url, which points to the RBD URL of the snapshot. As a result, the instance does not running with new volume. Note: If we create a new VM with new volume from volume backup in Horizon. Instance running successfully. What steps should we take to resolve this? Is there a flaw in our approach when designing the API? What is the best practice we should follow when creating an API for backup and restore (especially for instances that boot from volumes)? Is there a configuration we need to adjust in Ceph? We would appreciate your advice on the best practices for backup and restore, particularly for instances that boot from volumes. Kind regards,
On Wed, 2024-08-21 at 11:39 +0000, meteccinar@gmail.com wrote:
Hi everyone, we make cloud region for our company. Using Openstack with Ceph (as Block Storage). We configured backup driver field as BackupDriverCeph in the cinder.conf file. Also created volume and backups pools in Ceph Rados. We have a few problems/questions.
We have developed an API that regularly backs up VMs (instances) that work with volumes. The process works as follows:
First, we use the create_server_image() method from the OpenStack API to create an image of the instance. This results in an image of the instance being automatically created in OpenStack with a Snapshot type (including metadata such as direct_url, stores, schema, etc.). Simultaneously, a snapshot is also automatically created using with the create_server() command. We then take backups of all volumes attached to the instance.
In our restore API, we use the image created from the instance backup mentioned above. We change block_device_mapping field and add the IDs of the volume backups to the block_device_mapping field of the image and use the create_server(image_name, volume_from_backed_volume) method in the OpenStack API to create a new instance. Then new volumes are created from volume backups and attach new volumes in new restored instance.
the normal way to restore a snapshot of an instance is to use the rebuild api why are you creating a new server in your restore flow?
The problem we're encountering is that when using the create_server command, the image created from the instance backup is tied to the direct_url, which points to the RBD URL of the snapshot. As a result, the instance does not running with new volume.
Note: If we create a new VM with new volume from volume backup in Horizon. Instance running successfully.
What steps should we take to resolve this?
Is there a flaw in our approach when designing the API?
What is the best practice we should follow when creating an API for backup and restore (especially for instances that boot from volumes)?
so nova and cidner already provide apis for this via the server create iamge api and rebuild api on the nova side and cidner alos has backup adn restore apis for creating volume snapshots and restoring them so im not sure why you need to build a new one instead of using the ones that already exist and are integrated into horizon already.
Is there a configuration we need to adjust in Ceph?
We would appreciate your advice on the best practices for backup and restore, particularly for instances that boot from volumes.
Kind regards,
Should I understand this? We can return to the snapshot with Rescue without recreating the same VM Instance. So is this reliable? How is it different from Volume Backup? The reason why we create a new instance is to go back to the backup of the broken VM that was taken at time t before it broke down. In our scenario, it is necessary to perform inspections on the corrupted VM. In fact, for this reason, we want the instance to have a backup. Because it will contain metadata. When we run the instance backup without touching the main VM without re-entering additional information such as network, flavor, etc., we want to create the backups of all volumes as volumes and actually turn them into a copy at time t. But as we mentioned, in this scenario, we create a volume backup, and when we connect the new volume created from this volume backup to the instance backup of the instance in the image section that we know belongs to that backup, the image is automatically connected to the snapshot taken snapshot while the image was created. This issue is confusing us. Could you please explain in more detail what strategy we should follow? Thank you for your answer in advance.
On Wed, 2024-08-21 at 12:48 +0000, meteccinar@gmail.com wrote:
Should I understand this? We can return to the snapshot with Rescue without recreating the same VM Instance. So is this reliable? How is it different from Volume Backup?
volume backup is a way of automating backups to happen on a regular basis https://docs.openstack.org/cinder/latest/admin/volume-backups.html rescuse is a nova api action that allows you to use optionally a diffent image to boot form a different os temporarily to try and fix filesytem and other problems that prevent booting. https://docs.openstack.org/api-ref/compute/#rescue-server-rescue-action rescue and backup are entirety unrelated concepts.
The reason why we create a new instance is to go back to the backup of the broken VM that was taken at time t before it broke down. In our scenario, it is necessary to perform inspections on the corrupted VM.
this can be done via rescue but if its needed for a protracted period of time and you need to restore the failed instnace in the meantime replaciign the instnace with a new one makes sense. i would not consider that to be a restore from backup in a normal sense as the port/vollumem uuids will change and that may be visabel to the applciation since the disk serial number will change and the port mac/ip will change a normal expection of restoring form a backup would not result in those changes.
In fact, for this reason, we want the instance to have a backup. Because it will contain metadata. When we run the instance backup without touching the main VM without re-entering additional information such as network, flavor, etc., we want to create the backups of all volumes as volumes and actually turn them into a copy at time t.
But as we mentioned, in this scenario, we create a volume backup, and when we connect the new volume created from this volume backup to the instance backup of the instance in the image section that we know belongs to that backup, the image is automatically connected to the snapshot taken snapshot while the image was created. This issue is confusing us. Could you please explain in more detail what strategy we should follow?
as documented in the create image api ref in the Asynchronous Postconditions section https://docs.openstack.org/api-ref/compute/#create-image-createimage-action """ in the volume-backed server case, volume snapshots will be created for all volumes attached to the server and then those will be represented with a block_device_mapping image property in the resulting snapshot image in the Image service. If that snapshot image is used later to create a new server, it will result in a volume-backed server where the root volume is created from the snapshot of the original root volume. The volumes created from the snapshots of the original other volumes will be attached to the server. """ so if you use the Create Image instnace action on a boot from volume instanec it will take a snapshot of the root volume and a snapshot of all the attached data voluems if you boot a new vm form that snapshot image it will also duplicate the data volumes form the volume snapshots. the restore api in nova is called rebuild, with microversion 2.93, we support rebuilding volume backed instances. https://docs.openstack.org/api-ref/compute/#rebuild-server-rebuild-action so if you want to restore in place then you can rebuild the instnace with the snapshot image. i would personally exepc that to reimage both the reootdisk adn roolback the volumes to the relevent volujme snapshot althoguh our api ref does not say one way or another. looking at the nova spec. https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volum... that does not really cover this case either so you might need to seperatly role back the volume snaphost using https://docs.openstack.org/api-ref/block-storage/v3/index.html?expanded=deta... or https://docs.openstack.org/api-ref/block-storage/v3/index.html?expanded=deta...
Thank you for your answer in advance.
As far as we understand from what you explained, when I make an instance backup, only the snapshot taken at that time is assigned as the root device and we cannot change it. Because this is not available in nova specs. If I understand correctly, we already have another obstacle ahead of us. In the metadata, the rbd url of the disk is located in the direct_url field of the snapshot/image/volume in Rados GW. Openstack does not allow changing this field. It gives 403. So, in this case, does this mean that we have to create a new instance from scratch for the instance to which we will connect the volume backup?
Also, we tried to take snapshot at t,t+1,t+2 times when writing data all times. Snapshots is successfully but when we rescue instance, the process wasn't success. Openstack Horizon Output: Error: Unable to rescue instance Details Requested rescue image 'ddc2df28-4f5-434b-9f9a-030e56425fc5' is not supported (HTTP 400) (Request-ID: req-065126c2-7f5f-4fd0-81b0-b45df446a83b) Nova-Api Log File : Unable to rescue an instance using a volume snapshot image with img_block_device_mapping image properties set Snapshot has already snapshot volume id in block_device_mapping metadata field. We are using Ceph for block storage. Is it related this issue ?
participants (3)
-
meteccinar@gmail.com
-
Oliver Weinmann
-
smooney@redhat.com