Hi Rajat,
Thank you for creating the patch. I have tested it in our environment, but I couldn't confirm that the issue has been resolved.
I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD was increased 20GB(due to a replica count of 2, doubling the size).
```
$ openstack volume list
+--------------------------------------+-------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+-------------+-----------+------+-------------+
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available | 10 | |
+--------------------------------------+-------------+-----------+------+-------------+
$ sudo ceph osd df ssd
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.19 4 up
5 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 21 KiB 1.2 GiB 1.7 TiB 1.54 1.53 17 up
7 ssd 1.74660 1.00000 1.7 TiB 28 GiB 26 GiB 32 KiB 1.2 GiB 1.7 TiB 1.55 1.54 14 up
4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up
TOTAL 5.4 TiB 56 GiB 53 GiB 108 KiB 3.2 GiB 5.4 TiB 1.01
MIN/MAX VAR: 0.02/1.54 STDDEV: 0.75
$ openstack volume set --type ceph-ssd --retype-policy on-demand xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
$ sudo ceph osd df ssd
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.19530 1.00000 200 GiB 383 MiB 11 MiB 15 KiB 372 MiB 200 GiB 0.19 0.14 4 up
5 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 21 KiB 1.2 GiB 1.7 TiB 2.10 1.54 17 up
7 ssd 1.74660 1.00000 1.7 TiB 38 GiB 36 GiB 32 KiB 1.2 GiB 1.7 TiB 2.11 1.54 14 up
4 ssd 1.74660 1.00000 1.7 TiB 420 MiB 11 MiB 37 KiB 409 MiB 1.7 TiB 0.02 0.02 28 up
TOTAL 5.4 TiB 76 GiB 73 GiB 108 KiB 3.2 GiB 5.4 TiB 1.37
MIN/MAX VAR: 0.02/1.54 STDDEV: 1.04
```
I checked the `get_capabilities` output in the debug logs and confirmed that sparse_copy_volume is True.
```
2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73,
'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd',
'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True, 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False, 'properties': {'thin_provisioning': {'title': 'Thin Provisioning', 'description': 'Sets thin provisioning.', 'type': 'boolean'},
'compression': {'title': 'Compression', 'description': 'Enables compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description': 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title': 'Replication', 'description': 'Enables replication.',
'type': 'boolean'}}}. get_capabilities /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751
```
I have looked at the source code[1] and believe that `sparse` is available when `_copy_volume_with_path` is called.
However, it appears that `_copy_volume_with_path` was not called and `_copy_volume_with_file` was called.
To investigate further, I added debug logs to the source code to see the output.
```
if (isinstance(src, str) and
isinstance(dest, str)):
if not throttle:
throttle = throttling.Throttle.get_default()
with throttle.subcommand(src, dest) as throttle_cmd:
_copy_volume_with_path(throttle_cmd['prefix'], src, dest,
size_in_m, blocksize, sync=sync,
execute=execute, ionice=ionice,
sparse=sparse)
else:
LOG.debug("called _copy_volume_with_file") ★add debug log
LOG.debug("src=%s, dest=%s", src, dest) ★add debug log
_copy_volume_with_file(src, dest, size_in_m)
```
The output of debug logs is as follows:
```
2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called _copy_volume_with_file copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634
2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object
at 0x7fe2775dfbb0> copy_volume /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635
```
I would appreciate it if you could confirm.
Best regards,
Hi Yuta,
Thanks for starting this thread.
I started looking into this issue and found out that retype+migration already provides a way to copy the data sparsely (with dd).
The only issue was it wasn't enabled in the RBD driver which I did with this patch[1] (more details in commit message).
If possible, can you try out the patch in your deployment and report if it fixes the issue you are experiencing?
Thanks
Rajat Dhasmana
Hi Eugen,
Thank you for your reply.
I've confirmed that 'rbd sparsify' can increase available space.
However, I believe there's room for improvement in the retype implementation.
Currently, retyping between Ceph backends causes zero-filling, which unnecessarily consumes time and storage space.
This is likely to happen frequently, for example, with retypes between Ceph's HDD and SSD backends, and the Ceph administrator would need to run 'rbd sparsify' frequently.
Are there any improvements to the retype implementation being considered in the community?
I would also appreciate hearing your opinion on the need for improvement.
Best regards,