Re: [cinder] How to retype while maintaining volume usage

older
[horizon] Cancel weekly meetings...

Yuta Kambe (Fujitsu)

7 Jul 2025 7 Jul '25

12:18 p.m.

Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

Attachments:

attachment.html (text/html — 3.8 KB)

Show replies by date

Rajat Dhasmana

7 Jul 7 Jul

1:14 p.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote:

...

Hi Eugen,

Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space.

However, I believe there's room for improvement in the retype implementation.

Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently.

Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement.

Best regards,

Yuta Kambe (Fujitsu)

9 Jul 9 Jul

6:10 a.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list +--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | | +--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1]https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6... However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards, ________________________________ 差出人: Rajat Dhasmana <rdhasman@redhat.com> 送信: 2025 年 7 月 7 日 (月曜日) 18:44 宛先: Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> 件名: Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com<mailto:yuta.kambe@fujitsu.com>> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

Rajat Dhasmana

7:28 p.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Yuta, Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results. *Direction of retype: volumes2 -> volumes* Before Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB*volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB After Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB*volumes2 5 32 19 B 3 4 KiB 0 27 GiB We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied). [1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote:

...

Hi Rajat,

Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size).

``` $ openstack volume list

+--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to |

+--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | |

+--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ```

I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ```

I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called.

[1] https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6...

However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output.

``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ```

The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ```

I would appreciate it if you could confirm.

Best regards,

------------------------------ *差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage

Hi Yuta,

Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing?

[1] https://review.opendev.org/c/openstack/cinder/+/954217

Thanks Rajat Dhasmana

On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) < yuta.kambe@fujitsu.com> wrote:

Hi Eugen,

Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space.

However, I believe there's room for improvement in the retype implementation.

Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently.

Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement.

Best regards,

Sean Mooney

10 Jul 10 Jul

12:12 a.m.

New subject: [cinder] How to retype while maintaining volume usage

On 09/07/2025 16:58, Rajat Dhasmana wrote:

...

Hi Yuta,

Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path.

there is actually 3 because if the vm is attach to a nova instance when you retype then nova does the retry and we have no guarantee or official support for spareness of any kind in that case. adding it would be a new feature and im not sure its something we should generically support across all cinder backends. when nova is doing a retry its doing a qemu block rebase and im not sure we can actually implement spareness in that case generically. i wounder if it would make sense for cinder to have a Sparcify api that could be used separately form retype to trim/sparcify the volume if the backend supports that?

...

Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results.

*Direction of retype: volumes2 -> volumes * Before Patch

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB *volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB

After Patch

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB *volumes2 5 32 19 B 3 4 KiB 0 27 GiB

We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied).

[1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217

Thanks Rajat Dhasmana

On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote:

Hi Rajat,

Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size).

``` $ openstack volume list +--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | | +--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ```

I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ```

I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called.

[1]https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6...

However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output.

``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ```

The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ```

I would appreciate it if you could confirm.

Best regards,

------------------------------------------------------------------------ *差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage

Hi Yuta,

Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing?

[1] https://review.opendev.org/c/openstack/cinder/+/954217

Thanks Rajat Dhasmana

On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote:

Hi Eugen,

Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space.

However, I believe there's room for improvement in the retype implementation.

Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently.

Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement.

Best regards,

yuta.kambe＠fujitsu.com

11 Jul 11 Jul

11:57 a.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Sean and Rajat,

...

guarantee or official support for spareness of any kind in that case.

adding it would be a new feature and im not sure its something we should generically support across all cinder backends.

when nova is doing a retry its doing a qemu block rebase and im not sure we can actually implement spareness in that case generically.

Let me clarify my understanding. Does "retry" mean re-copying data written during the retype operation when the volume is attached to an instance? Also, is it correct to understand that during a retry, qemu block rebase is performed, so a process other than `_copy_volume_with_path` or `_copy_volume_with_file` is executed? I would like to understand this at the code level. Could you please provide the function names and locations where this processing occurs?

...

i wounder if it would make sense for cinder to have a Sparcify api that could be used separately form retype

to trim/sparcify the volume if the backend supports that?

To minimize the behavior change of existing functions, I think it would be effective to sparsify after retyping if the backend supports it. However, this method still has the problems of taking a long time for retype and temporarily consuming capacity. Ideally, an implementation that can migrate without zero-filling by using driver-specific features (for example, rbd cp for rbd) would be preferable. However, in that case, I am concerned that the scope of the modifications would be large. I think we need to discuss what implementation approach to take, so I would appreciate your opinions. Best Regard, Sean Mooney wrote:

...

On 09/07/2025 16:58, Rajat Dhasmana wrote:

...
Hi Yuta, Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. there is actually 3 because if the vm is attach to a nova instance when you retype then nova does the retry and we have no

guarantee or official support for spareness of any kind in that case.

adding it would be a new feature and im not sure its something we should generically support across all cinder backends.

when nova is doing a retry its doing a qemu block rebase and im not sure we can actually implement spareness in that case generically.

i wounder if it would make sense for cinder to have a Sparcify api that could be used separately form retype

to trim/sparcify the volume if the backend supports that?

...
Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results. *Direction of retype: volumes2 -> volumes * Before Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB *volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB After Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB *volumes2 5 32 19 B 3 4 KiB 0 27 GiB We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied). [1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote: Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list +--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | | +--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1]https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6... However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards, ------------------------------------------------------------------------ *差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

Rajat Dhasmana

16 Jul 16 Jul

5:02 p.m.

New subject: [cinder] How to retype while maintaining volume usage

On Fri, Jul 11, 2025 at 1:59 PM <yuta.kambe@fujitsu.com> wrote:

...

Hi Sean and Rajat,

...
guarantee or official support for spareness of any kind in that case.

adding it would be a new feature and im not sure its something we should generically support across all cinder backends.

when nova is doing a retry its doing a qemu block rebase and im not sure we can actually implement spareness in that case generically.

Let me clarify my understanding. Does "retry" mean re-copying data written during the retype operation when the volume is attached to an instance?

I think Sean meant to write "retype" so here "retry" == "retype"

...

Also, is it correct to understand that during a retry, qemu block rebase is performed, so a process other than `_copy_volume_with_path` or `_copy_volume_with_file` is executed? I would like to understand this at the code level. Could you please provide the function names and locations where this processing occurs?

When the volume is attached to the server, it's nova/libvirt that does the writing of data so yes, nova/libvirt will be performing the migration of data. Here are the references: Cinder calling nova to swap: https://github.com/openstack/cinder/blob/21af57108f9f67ce9e5af78a6b7eef0d3f8... novaclient: https://opendev.org/openstack/python-novaclient/src/branch/master/novaclient... Nova/libvirt: https://github.com/openstack/nova/blob/19d621a66f8d90500c7b630d7b9fbae1d0542... Sean can correct me on the nova links but I'm certain these are the right pieces of code performing the nova operation.

...

...
i wounder if it would make sense for cinder to have a Sparcify api that could be used separately form retype

to trim/sparcify the volume if the backend supports that?

This is really not helpful. Copying volume as sparse is not just to save space but also to improve performance and calling a new API is as easy/hard as executing the equivalent command on the backend side so it doesn't seem to be worth the effort.

...

To minimize the behavior change of existing functions, I think it would be effective to sparsify after retyping if the backend supports it. However, this method still has the problems of taking a long time for retype and temporarily consuming capacity. Ideally, an implementation that can migrate without zero-filling by using driver-specific features (for example, rbd cp for rbd) would be preferable. However, in that case, I am concerned that the scope of the modifications would be large. I think we need to discuss what implementation approach to take, so I would appreciate your opinions.

We do use rbd optimizations during retype but they are not valid across clusters hence we perform the generic migration instead.

...

Best Regard,

Sean Mooney wrote:

...
On 09/07/2025 16:58, Rajat Dhasmana wrote:

...
Hi Yuta, Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. there is actually 3 because if the vm is attach to a nova instance when you retype then nova does the retry and we have no

guarantee or official support for spareness of any kind in that case.

adding it would be a new feature and im not sure its something we should generically support across all cinder backends.

when nova is doing a retry its doing a qemu block rebase and im not sure we can actually implement spareness in that case generically.

i wounder if it would make sense for cinder to have a Sparcify api that could be used separately form retype

to trim/sparcify the volume if the backend supports that?

...
Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results. *Direction of retype: volumes2 -> volumes * Before Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB *volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB After Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB *volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL *volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB *volumes2 5 32 19 B 3 4 KiB 0 27 GiB We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied). [1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote: Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list

+--------------------------------------+-------------+-----------+------+-------------+

...
| ID | Name | Status | Size | Attached to |

+--------------------------------------+-------------+-----------+------+-------------+

...
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | |

+--------------------------------------+-------------+-----------+------+-------------+

...
$ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info':

'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd',

...
'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1]

https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6. ..

...
However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards,

...
...
*差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available

space.

...
However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

Sean Mooney

6:49 p.m.

New subject: [cinder] How to retype while maintaining volume usage

On 16/07/2025 14:32, Rajat Dhasmana wrote:

...

> to trim/sparcify the volume if the backend supports that?

This is really not helpful. Copying volume as sparse is not just to save space but also to improve performance and calling a new API is as easy/hard as executing the equivalent command on the backend side so it doesn't seem to be worth the effort.

so just to touch on this. from my perspective as a cloud user i may want to trigger the sparsification manually for a long time trim/discard was not supported on all backend so if i as an end user have a long live volume and esspically if if i have done a lot of deletions over time for data in that volume i can see a use case for it. what i find more surprising is the concept that cinder would not consider the volume unsupported if you went out of band and did any operation on the storage backend directly. like on the nova side if you as an operator ssh to a host and perform operations on the vm file or go to ceph and executed the operation directly on the ceph cluster for a nova provisioned ceph volume you would void your warranty. i.e. if you go out of band and take snapshot of ceph volumes or something like that directly in ceph then any storage issues the vm has are yours to fix. so im surprised you would discount the idea of a driver independent cinder api to allow triggering the apporate command on the backend and instead encourage operator to do it them selves. also if this api existed and nova did the data transfer for the in use volume we could call it if the connection info says the backend support spareness or requested it. granted cinder could also just call it in that case after we tell cinder the retype is complete and the data is transferred. that would not need any changes on the nova side which im ok with but if cinder want to make a guaretee of sparceness preservation on retype it will have to handel the in-use case as well. otherwise it will remain best effort.

...

To minimize the behavior change of existing functions, I think it would be effective to sparsify after retyping if the backend supports it.

Rajat Dhasmana

11:01 p.m.

New subject: [cinder] How to retype while maintaining volume usage

On Wed, Jul 16, 2025 at 8:50 PM Sean Mooney <smooney@redhat.com> wrote:

...

On 16/07/2025 14:32, Rajat Dhasmana wrote:

...
> to trim/sparcify the volume if the backend supports that?

This is really not helpful. Copying volume as sparse is not just to save space but also to improve performance and calling a new API is as easy/hard as executing the equivalent command on the backend side so it doesn't seem to be worth the effort.

so just to touch on this. from my perspective as a cloud user i may want to trigger the sparsification manually for a long time trim/discard was not supported on all backend so if i as an end user have a long live volume and esspically if if i have done a lot of deletions over time for data in that volume i can see a use case for it.

what i find more surprising is the concept that cinder would not consider the volume unsupported if you went out of band and did any operation on the storage backend directly.

like on the nova side if you as an operator ssh to a host and perform operations on the vm file or go to ceph and executed the operation directly on the ceph cluster for a nova provisioned ceph volume you would void your warranty.

i.e. if you go out of band and take snapshot of ceph volumes or something like that directly in ceph then any storage issues the vm has are yours to fix.

so im surprised you would discount the idea of a driver independent cinder api to allow triggering the apporate command on the backend and instead

encourage operator to do it them selves.

I'm not really encouraging anyone to do anything :) An API for sparsifying volumes will require a generic interface and a driver specific implementation across various vendors, given if they are motivated enough to implement their vendor specific logic for it. Such a solution doesn't exist so either we fix the "copy" operations to preserve sparseness or the operator can fix things on their own knowing the consequences of it.

...

also if this api existed and nova did the data transfer for the in use volume we could call it if the connection info says the backend support spareness or requested it.

We already do that with the "discard" parameter that sets the "driver_discard"[1] config for libvirt based on the value.

...

granted cinder could also just call it in that case after we tell cinder the retype is complete and the data is transferred.

that would not need any changes on the nova side which im ok with but if cinder want to make a guaretee of sparceness preservation on retype it

will have to handel the in-use case as well. otherwise it will remain best effort.

My final thoughts on this would be, we already provide a mechanism to report unmap/discard support for the Cinder backend and if the hypervisor|guest|bus support it, they can issue "fstrim". Although "fstrim" works at the filesystem level and sparsification on the backend will be at block level, an operator should be well versed with the pros and cons of each and how it will affect their overall deployment otherwise won't pursue it. I'm just stating what exists and what doesn't, also trying to fix what I can so I really don't vouch for any choice the operator makes (even the "rbd sparsify" workaround wasn't my recommendation). Hope that clarifies my take here.

...

...
To minimize the behavior change of existing functions, I think it would be effective to sparsify after retyping if the backend supports it.

[1] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/volum... Thanks Rajat Dhasmana

Yuta Kambe (Fujitsu)

10 Jul 10 Jul

1:31 p.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Rajat, Thank you for suggesting additional patches. I will reply as soon as I confirm. However, I also think it would be better to implement this according to the backend driver rather than modifying the generic implementation. I would appreciate your consideration. ________________________________ 差出人: Rajat Dhasmana <rdhasman@redhat.com> 送信: 2025 年 7 月 10 日 (木曜日) 0:58 宛先: Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> 件名: Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results. Direction of retype: volumes2 -> volumes Before Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB After Patch root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB volumes2 5 32 19 B 3 4 KiB 0 27 GiB We can see that before the patch, the retype increased the space from 757 MiB to 2.7 GiB, and after applying the patch it went from 757 MiB to 869 MiB showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied). [1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com<mailto:yuta.kambe@fujitsu.com>> wrote: Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list +--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | | +--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1]https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6... However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards, ________________________________ 差出人: Rajat Dhasmana <rdhasman@redhat.com<mailto:rdhasman@redhat.com>> 送信: 2025 年 7 月 7 日 (月曜日) 18:44 宛先: Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com<mailto:yuta.kambe@fujitsu.com>> Cc: openstack-discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> 件名: Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com<mailto:yuta.kambe@fujitsu.com>> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

yuta.kambe＠fujitsu.com

16 Jul 16 Jul

9:04 a.m.

New subject: [cinder] How to retype while maintaining volume usage

Hi Rajat,

...

We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied).

[1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217

I have patched my environment and verified that the problem is resolved. I was able to retype the following three types of volumes without increasing backend storage usage, and I had no problems launching instances from bootable volumes after retyping. - test-volume1: An empty volume not attached to a VM. - test-volume2: A bootable volume attached to a VM. - test-volume3: A bootable volume not attached to a VM. - Before retyping ``` $ openstack volume list --long +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | ID | Name | Status | Size | Type | Bootable | Attached to | Properties | +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available | 20 | ceph-hdd | true | | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use | 20 | ceph-hdd | true | Attached to test-instance on /dev/vda | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available | 20 | ceph-hdd | false | | | +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ $ sudo ceph osd df hdd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 6 hdd 0.24080 1.00000 247 GiB 47 GiB 13 MiB 23 KiB 361 MiB 200 GiB 18.89 1.50 166 up 3 hdd 3.71149 1.00000 3.7 TiB 464 GiB 17 GiB 29 KiB 1.1 GiB 3.3 TiB 12.21 0.97 182 up TOTAL 4.0 TiB 510 GiB 17 GiB 53 KiB 1.4 GiB 3.5 TiB 12.61 MIN/MAX VAR: 0.97/1.50 STDDEV: 4.45 $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 2.8 GiB 2.4 GiB 15 KiB 388 MiB 197 GiB 1.40 4.96 51 up 7 ssd 1.74660 1.00000 1.7 TiB 2.8 GiB 2.4 GiB 16 KiB 412 MiB 1.7 TiB 0.16 0.56 24 up TOTAL 1.9 TiB 5.6 GiB 4.8 GiB 32 KiB 800 MiB 1.9 TiB 0.28 ``` - After retyping ``` $ openstack volume list --long +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | ID | Name | Status | Size | Type | Bootable | Attached to | Properties | +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available | 20 | ceph-ssd | true | | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use | 20 | ceph-ssd | true | Attached to test-instance on /dev/vda | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available | 20 | ceph-ssd | false | | | +--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ $ sudo ceph osd df hdd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 6 hdd 0.24080 1.00000 247 GiB 47 GiB 13 MiB 23 KiB 361 MiB 200 GiB 18.89 1.51 166 up 3 hdd 3.71149 1.00000 3.7 TiB 461 GiB 14 GiB 29 KiB 1.1 GiB 3.3 TiB 12.13 0.97 182 up TOTAL 4.0 TiB 508 GiB 14 GiB 53 KiB 1.4 GiB 3.5 TiB 12.54 MIN/MAX VAR: 0.97/1.51 STDDEV: 4.50 $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 6.1 GiB 5.7 GiB 15 KiB 380 MiB 194 GiB 3.03 4.97 51 up 7 ssd 1.74660 1.00000 1.7 TiB 6.1 GiB 5.7 GiB 16 KiB 399 MiB 1.7 TiB 0.34 0.56 24 up TOTAL 1.9 TiB 12 GiB 11 GiB 32 KiB 780 MiB 1.9 TiB 0.61 MIN/MAX VAR: 0.56/4.97 STDDEV: 1.72 ``` Best Regards, Rajat Dhasmana wrote:

...

Hi Yuta,

Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results.

*Direction of retype: volumes2 -> volumes* Before Patch

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB*volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB

After Patch

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB*volumes2 5 32 19 B 3 4 KiB 0 27 GiB

We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied).

[1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217

Thanks Rajat Dhasmana

On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com> wrote:

...
Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list +--------------------------------------+-------------+-----------+------+-------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------+-----------+------+-------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | | +--------------------------------------+-------------+-----------+------+-------------+ $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd', 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1] https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6... However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see the output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards, ------------------------------ *差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) < yuta.kambe@fujitsu.com> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

Rajat Dhasmana

5:05 p.m.

New subject: [cinder] How to retype while maintaining volume usage

On Wed, Jul 16, 2025 at 11:05 AM <yuta.kambe@fujitsu.com> wrote:

...

Hi Rajat,

...
We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied).

[1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217

I have patched my environment and verified that the problem is resolved.

Great, thanks for verifying!

...

I was able to retype the following three types of volumes without increasing backend storage usage, and I had no problems launching instances from bootable volumes after retyping.

- test-volume1: An empty volume not attached to a VM. - test-volume2: A bootable volume attached to a VM. - test-volume3: A bootable volume not attached to a VM.

- Before retyping

``` $ openstack volume list --long

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | ID | Name | Status | Size | Type | Bootable | Attached to | Properties |

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available | 20 | ceph-hdd | true | | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use | 20 | ceph-hdd | true | Attached to test-instance on /dev/vda | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available | 20 | ceph-hdd | false | | |

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+

$ sudo ceph osd df hdd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 6 hdd 0.24080 1.00000 247 GiB 47 GiB 13 MiB 23 KiB 361 MiB 200 GiB 18.89 1.50 166 up 3 hdd 3.71149 1.00000 3.7 TiB 464 GiB 17 GiB 29 KiB 1.1 GiB 3.3 TiB 12.21 0.97 182 up TOTAL 4.0 TiB 510 GiB 17 GiB 53 KiB 1.4 GiB 3.5 TiB 12.61 MIN/MAX VAR: 0.97/1.50 STDDEV: 4.45

$ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 2.8 GiB 2.4 GiB 15 KiB 388 MiB 197 GiB 1.40 4.96 51 up 7 ssd 1.74660 1.00000 1.7 TiB 2.8 GiB 2.4 GiB 16 KiB 412 MiB 1.7 TiB 0.16 0.56 24 up TOTAL 1.9 TiB 5.6 GiB 4.8 GiB 32 KiB 800 MiB 1.9 TiB 0.28 ```

- After retyping

``` $ openstack volume list --long

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | ID | Name | Status | Size | Type | Bootable | Attached to | Properties |

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+ | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available | 20 | ceph-ssd | true | | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use | 20 | ceph-ssd | true | Attached to test-instance on /dev/vda | | | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available | 20 | ceph-ssd | false | | |

+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+

$ sudo ceph osd df hdd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 6 hdd 0.24080 1.00000 247 GiB 47 GiB 13 MiB 23 KiB 361 MiB 200 GiB 18.89 1.51 166 up 3 hdd 3.71149 1.00000 3.7 TiB 461 GiB 14 GiB 29 KiB 1.1 GiB 3.3 TiB 12.13 0.97 182 up TOTAL 4.0 TiB 508 GiB 14 GiB 53 KiB 1.4 GiB 3.5 TiB 12.54 MIN/MAX VAR: 0.97/1.51 STDDEV: 4.50

$ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 6.1 GiB 5.7 GiB 15 KiB 380 MiB 194 GiB 3.03 4.97 51 up 7 ssd 1.74660 1.00000 1.7 TiB 6.1 GiB 5.7 GiB 16 KiB 399 MiB 1.7 TiB 0.34 0.56 24 up TOTAL 1.9 TiB 12 GiB 11 GiB 32 KiB 780 MiB 1.9 TiB 0.61 MIN/MAX VAR: 0.56/4.97 STDDEV: 1.72 ```

Looks good, I will continue with the effort to land the changes upstream, thanks again for reporting it and verifying the fixes as well.

...

Best Regards,

...
Hi Yuta,

Thanks for the followup. I forgot that we have two code paths and RBD goes through the one using chunks instead of 'dd'. Looks like the sparseness support was never implemented for the chunked transfer code path. Anyways, I was able to write another patch on top of my last one here[1] which adds the support. I've tested the workflow by deploying a ceph cluster with two pools as two cinder backends and performing retype between them and following are the results.

*Direction of retype: volumes2 -> volumes* Before Patch

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 948 MiB 669 757 MiB 2.65 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 2.9 GiB 993 2.7 GiB 9.86 25 GiB*volumes2 5 32 9.8 KiB 3 14 KiB 0 25 GiB

After Patch

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 948 MiB 669 757 MiB 2.64 27 GiB*volumes2 5 32 22 MiB 20 22 MiB 0.08 27 GiB

root@test-devstack-repl:/# ceph df

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL

*volumes 3 32 1.0 GiB 691 869 MiB 3.05 27 GiB*volumes2 5 32 19 B 3 4 KiB 0 27 GiB

We can see that before the patch, the retype increased the space from *757 MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869 MiB* showing sparse volume copy. Though I haven't conducted any further testing which would validate the data integrity like retyping a bootable volume and launching an instance from it. I would like to hear your feedback on if this patch works for you (note that both patches[1][2] need to be applied).

[1] https://review.opendev.org/c/openstack/cinder/+/954523 [2] https://review.opendev.org/c/openstack/cinder/+/954217

Thanks Rajat Dhasmana

On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) < yuta.kambe@fujitsu.com> wrote:

...
Hi Rajat, Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved. I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size). ``` $ openstack volume list

+--------------------------------------+-------------+-----------+------+-------------+

...
| ID | Name | Status | Size | Attached to |

+--------------------------------------+-------------+-----------+------+-------------+

...
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | |

+--------------------------------------+-------------+-----------+------+-------------+

...
$ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up 5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up 7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01 MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75 $ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx $ sudo ceph osd df ssd ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up 5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up 7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up 4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37 MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04 ``` I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True. ``` 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73, 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info':

'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd',

...
'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'}, 'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751 ``` I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called. [1]

https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6. ..

...
However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called. To investigate further, I added debug logs to the source code to see

Rajat Dhasmana wrote: the

...
...
output. ``` if (isinstance(src, str) and isinstance(dest, str)): if not throttle: throttle = throttling.Throttle.get_default() with throttle.subcommand(src, dest) as throttle_cmd: _copy_volume_with_path(throttle_cmd['prefix'], src, dest, size_in_m, blocksize, sync=sync, execute=execute, ionice=ionice, sparse=sparse) else: LOG.debug("called _copy_volume_with_file") ★add debug log LOG.debug("src=%s, dest=%s", src, dest) ★add debug log _copy_volume_with_file(src, dest, size_in_m) ``` The output of debug logs is as follows: ``` 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635 ``` I would appreciate it if you could confirm. Best regards, ------------------------------ *差出人:* Rajat Dhasmana <rdhasman@redhat.com> *送信:* 2025 年 7 月 7 日 (月曜日) 18:44 *宛先:* Kambe, Yuta/神戸雄太 <yuta.kambe@fujitsu.com> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *件名:* Re: [cinder] How to retype while maintaining volume usage Hi Yuta, Thanks for starting this thread. I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd). The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message). If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing? [1] https://review.opendev.org/c/openstack/cinder/+/954217 Thanks Rajat Dhasmana On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) < yuta.kambe@fujitsu.com> wrote: Hi Eugen, Thank you for your reply. I've confirmed that 'rbd sparsify' can increase available space. However, I believe there's room for improvement in the retype implementation. Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space. This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently. Are there any improvements to the retype implementation being considered in the community? I would also appreciate hearing your opinion on the need for improvement. Best regards,

126

Age (days ago)

135

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Rajat Dhasmana
Sean Mooney
Yuta Kambe (Fujitsu)
yuta.kambe＠fujitsu.com

Re: [cinder] How to retype while maintaining volume usage

tags

participants (4)