On Wed, Jul 16, 2025 at 11:05 AM <yuta.kambe@fujitsu.com> wrote:
Hi Rajat,

> We can see that before the patch, the retype increased the space from *757
> MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869
> MiB* showing sparse volume copy.
> Though I haven't conducted any further testing which would validate the
> data integrity like retyping a bootable volume and launching an instance
> from it.
> I would like to hear your feedback on if this patch works for you (note
> that both patches[1][2] need to be applied).
>
> [1] https://review.opendev.org/c/openstack/cinder/+/954523
> [2] https://review.opendev.org/c/openstack/cinder/+/954217

I have patched my environment and verified that the problem is resolved.

Great, thanks for verifying!
 
I was able to retype the following three types of volumes without increasing backend storage usage,
and I had no problems launching instances from bootable volumes after retyping.

- test-volume1: An empty volume not attached to a VM.
- test-volume2: A bootable volume attached to a VM.
- test-volume3: A bootable volume not attached to a VM.

- Before retyping

```
$ openstack volume list --long
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+
| ID                                   | Name         | Status    | Size | Type     | Bootable | Attached to                            | Properties |
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available |   20 | ceph-hdd | true     |                                        |            |
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use    |   20 | ceph-hdd | true     | Attached to test-instance on /dev/vda  |            |
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available |   20 | ceph-hdd | false    |                                        |            |
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+

$ sudo ceph osd df hdd
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP    META     AVAIL    %USE   VAR   PGS  STATUS
 6    hdd  0.24080   1.00000  247 GiB   47 GiB  13 MiB  23 KiB  361 MiB  200 GiB  18.89  1.50  166      up
 3    hdd  3.71149   1.00000  3.7 TiB  464 GiB  17 GiB  29 KiB  1.1 GiB  3.3 TiB  12.21  0.97  182      up
                       TOTAL  4.0 TiB  510 GiB  17 GiB  53 KiB  1.4 GiB  3.5 TiB  12.61                   
MIN/MAX VAR: 0.97/1.50  STDDEV: 4.45

$ sudo ceph osd df ssd
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE  VAR   PGS  STATUS
 0    ssd  0.19530   1.00000  200 GiB  2.8 GiB  2.4 GiB  15 KiB  388 MiB  197 GiB  1.40  4.96   51      up
 7    ssd  1.74660   1.00000  1.7 TiB  2.8 GiB  2.4 GiB  16 KiB  412 MiB  1.7 TiB  0.16  0.56   24      up
                       TOTAL  1.9 TiB  5.6 GiB  4.8 GiB  32 KiB  800 MiB  1.9 TiB  0.28         
```

- After retyping

```
$ openstack volume list --long
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+
| ID                                   | Name         | Status    | Size | Type     | Bootable | Attached to                            | Properties |
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume3 | available |   20 | ceph-ssd | true     |                                        |            |
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume2 | in-use    |   20 | ceph-ssd | true     | Attached to test-instance on /dev/vda  |            |
| xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume1 | available |   20 | ceph-ssd | false    |                                        |            |
+--------------------------------------+--------------+-----------+------+----------+----------+----------------------------------------+------------+

$ sudo ceph osd df hdd
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP    META     AVAIL    %USE   VAR   PGS  STATUS
 6    hdd  0.24080   1.00000  247 GiB   47 GiB  13 MiB  23 KiB  361 MiB  200 GiB  18.89  1.51  166      up
 3    hdd  3.71149   1.00000  3.7 TiB  461 GiB  14 GiB  29 KiB  1.1 GiB  3.3 TiB  12.13  0.97  182      up
                       TOTAL  4.0 TiB  508 GiB  14 GiB  53 KiB  1.4 GiB  3.5 TiB  12.54                   
MIN/MAX VAR: 0.97/1.51  STDDEV: 4.50

$ sudo ceph osd df ssd
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE  VAR   PGS  STATUS
 0    ssd  0.19530   1.00000  200 GiB  6.1 GiB  5.7 GiB  15 KiB  380 MiB  194 GiB  3.03  4.97   51      up
 7    ssd  1.74660   1.00000  1.7 TiB  6.1 GiB  5.7 GiB  16 KiB  399 MiB  1.7 TiB  0.34  0.56   24      up
                       TOTAL  1.9 TiB   12 GiB   11 GiB  32 KiB  780 MiB  1.9 TiB  0.61                   
MIN/MAX VAR: 0.56/4.97  STDDEV: 1.72
```


Looks good, I will continue with the effort to land the changes upstream, thanks again for reporting it and verifying the fixes as well.
 
Best Regards,

Rajat Dhasmana wrote:
> Hi Yuta,
>
> Thanks for the followup. I forgot that we have two code paths and RBD goes
> through the one using chunks instead of 'dd'.
> Looks like the sparseness support was never implemented for the chunked
> transfer code path.
> Anyways, I was able to write another patch on top of my last one here[1]
> which adds the support.
> I've tested the workflow by deploying a ceph cluster with two pools as two
> cinder backends and performing retype between them and following are the
> results.
>
>
> *Direction of retype: volumes2 -> volumes*
> Before Patch
>
> root@test-devstack-repl:/# ceph df
> --- POOLS ---
> POOL      ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
>
> *volumes    3   32  948 MiB      669  757 MiB   2.65     27 GiB*volumes2
> 5   32   22 MiB       20   22 MiB   0.08     27 GiB
>
> root@test-devstack-repl:/# ceph df
> --- POOLS ---
> POOL      ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
>
> *volumes    3   32  2.9 GiB      993  2.7 GiB   9.86     25 GiB*volumes2
> 5   32  9.8 KiB        3   14 KiB      0     25 GiB
>
> After Patch
>
> root@test-devstack-repl:/# ceph df
>
> --- POOLS ---
> POOL      ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
>
> *volumes    3   32  948 MiB      669  757 MiB   2.64     27 GiB*volumes2
> 5   32   22 MiB       20   22 MiB   0.08     27 GiB
>
> root@test-devstack-repl:/# ceph df
>
> --- POOLS ---
> POOL      ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
>
> *volumes    3   32  1.0 GiB      691  869 MiB   3.05     27 GiB*volumes2
> 5   32     19 B        3    4 KiB      0     27 GiB
>
> We can see that before the patch, the retype increased the space from *757
> MiB to 2.7 GiB,* and after applying the patch it went from *757 MiB to 869
> MiB* showing sparse volume copy.
> Though I haven't conducted any further testing which would validate the
> data integrity like retyping a bootable volume and launching an instance
> from it.
> I would like to hear your feedback on if this patch works for you (note
> that both patches[1][2] need to be applied).
>
> [1] https://review.opendev.org/c/openstack/cinder/+/954523
> [2] https://review.opendev.org/c/openstack/cinder/+/954217
>
> Thanks
> Rajat Dhasmana
>
> On Wed, Jul 9, 2025 at 8:10 AM Yuta Kambe (Fujitsu) <yuta.kambe@fujitsu.com>
> wrote:
> > Hi Rajat,
> > Thank you for creating the patch. I have tested it in our environment, but
> > I couldn't confirm that the issue has been resolved.
> > I retyped an empty 10GB volume from HDD to SSD, however, disk usage of SSD
> > was increased 20GB(due to a replica count of 2, doubling the size).
> > ```
> > $ openstack volume list
> > +--------------------------------------+-------------+-----------+------+-------------+
> > | ID                                   | Name        | Status    | Size |
> > Attached to |
> > +--------------------------------------+-------------+-----------+------+-------------+
> > | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | test-volume | available |   10 |
> >             |
> > +--------------------------------------+-------------+-----------+------+-------------+
> > $ sudo ceph osd df ssd
> > ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP     META
> > AVAIL    %USE  VAR   PGS  STATUS
> >  0    ssd  0.19530   1.00000  200 GiB  383 MiB  11 MiB   15 KiB  372 MiB
> >  200 GiB  0.19  0.19    4      up
> >  5    ssd  1.74660   1.00000  1.7 TiB   28 GiB  26 GiB   21 KiB  1.2 GiB
> >  1.7 TiB  1.54  1.53   17      up
> >  7    ssd  1.74660   1.00000  1.7 TiB   28 GiB  26 GiB   32 KiB  1.2 GiB
> >  1.7 TiB  1.55  1.54   14      up
> >  4    ssd  1.74660   1.00000  1.7 TiB  420 MiB  11 MiB   37 KiB  409 MiB
> >  1.7 TiB  0.02  0.02   28      up
> >                        TOTAL  5.4 TiB   56 GiB  53 GiB  108 KiB  3.2 GiB
> >  5.4 TiB  1.01
> > MIN/MAX VAR: 0.02/1.54  STDDEV: 0.75
> > $ openstack volume set --type ceph-ssd --retype-policy on-demand
> > xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
> > $ sudo ceph osd df ssd
> > ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP     META
> > AVAIL    %USE  VAR   PGS  STATUS
> >  0    ssd  0.19530   1.00000  200 GiB  383 MiB  11 MiB   15 KiB  372 MiB
> >  200 GiB  0.19  0.14    4      up
> >  5    ssd  1.74660   1.00000  1.7 TiB   38 GiB  36 GiB   21 KiB  1.2 GiB
> >  1.7 TiB  2.10  1.54   17      up
> >  7    ssd  1.74660   1.00000  1.7 TiB   38 GiB  36 GiB   32 KiB  1.2 GiB
> >  1.7 TiB  2.11  1.54   14      up
> >  4    ssd  1.74660   1.00000  1.7 TiB  420 MiB  11 MiB   37 KiB  409 MiB
> >  1.7 TiB  0.02  0.02   28      up
> >                        TOTAL  5.4 TiB   76 GiB  73 GiB  108 KiB  3.2 GiB
> >  5.4 TiB  1.37
> > MIN/MAX VAR: 0.02/1.54  STDDEV: 1.04
> > ```
> > I checked the `get_capabilities` output in the debug logs and confirmed
> > that sparse_copy_volume is True.
> > ```
> > 2025-07-08 15:58:13.960 315503 DEBUG cinder.volume.manager [None
> > req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] Obtained capabilities
> > list: {'vendor_name': 'Open Source', 'driver_version': '1.3.0',
> > 'storage_protocol': 'ceph', 'total_capacity_gb': 2615.73,
> > 'free_capacity_gb': 2568.31, 'reserved_percentage': 0, 'multiattach': True,
> > 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0',
> > 'location_info':
> > 'ceph:/etc/ceph/ceph.conf:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:cinder-ssd:volumes_data_ssd',
> > 'backend_state': 'up', 'qos_support': True, 'sparse_copy_volume': True,
> > 'volume_backend_name': 'ceph-ssd', 'replication_enabled': False,
> > 'properties': {'thin_provisioning': {'title': 'Thin Provisioning',
> > 'description': 'Sets thin provisioning.', 'type': 'boolean'},
> > 'compression': {'title': 'Compression', 'description': 'Enables
> > compression.', 'type': 'boolean'}, 'qos': {'title': 'QoS', 'description':
> > 'Enables QoS.', 'type': 'boolean'}, 'replication_enabled': {'title':
> > 'Replication', 'description': 'Enables replication.', 'type': 'boolean'}}}.
> > get_capabilities
> > /usr/lib/python3.9/site-packages/cinder/volume/manager.py:4751
> > ```
> > I have looked at the source code[1] and believe that `sparse` is available
> > when `_copy_volume_with_path` is called.
> > [1]
> > https://opendev.org/openstack/cinder/src/commit/27373d61fe54e55afa91f1e93cc6...
> > However,  it appears that `_copy_volume_with_path` was not called and
> > `_copy_volume_with_file` was called.
> > To investigate further, I added debug logs to the source code to see the
> > output.
> > ```
> >    if (isinstance(src, str) and
> >             isinstance(dest, str)):
> >         if not throttle:
> >             throttle = throttling.Throttle.get_default()
> >         with throttle.subcommand(src, dest) as throttle_cmd:
> >             _copy_volume_with_path(throttle_cmd['prefix'], src, dest,
> >                                    size_in_m, blocksize, sync=sync,
> >                                    execute=execute, ionice=ionice,
> >                                    sparse=sparse)
> >     else:
> >         LOG.debug("called _copy_volume_with_file")   ★add debug log
> >         LOG.debug("src=%s, dest=%s", src, dest)           ★add debug log
> >         _copy_volume_with_file(src, dest, size_in_m)
> > ```
> > The output of debug logs is as follows:
> > ```
> > 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None
> > req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -] called
> > _copy_volume_with_file copy_volume
> > /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:634
> > 2025-07-08 17:38:03.068 426268 DEBUG cinder.volume.volume_utils [None
> > req-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - - -]
> > src=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper object at
> > 0x7fe277646970>, dest=<os_brick.initiator.linuxrbd.RBDVolumeIOWrapper
> > object at 0x7fe2775dfbb0> copy_volume
> > /usr/lib/python3.9/site-packages/cinder/volume/volume_utils.py:635
> > ```
> > I would appreciate it if you could confirm.
> > Best regards,
> > ------------------------------
> > *差出人:* Rajat Dhasmana <rdhasman@redhat.com>
> > *送信:* 2025 年 7 月 7 日 (月曜日) 18:44
> > *宛先:* Kambe, Yuta/神戸 雄太 <yuta.kambe@fujitsu.com>
> > *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org>
> > *件名:* Re: [cinder] How to retype while maintaining volume usage
> > Hi Yuta,
> > Thanks for starting this thread.
> > I started looking into this issue and found out that retype+migration
> > already provides a way to copy the data sparsely (with dd).
> > The only issue was it wasn't enabled in the RBD driver which I did with
> > this patch[1] (more details in commit message).
> > If possible, can you try out the patch in your deployment and report if it
> > fixes the issue you are experiencing?
> > [1] https://review.opendev.org/c/openstack/cinder/+/954217
> > Thanks
> > Rajat Dhasmana
> > On Mon, Jul 7, 2025 at 2:19 PM Yuta Kambe (Fujitsu) <
> > yuta.kambe@fujitsu.com> wrote:
> > Hi Eugen,
> > Thank you for your reply.
> > I've confirmed that 'rbd sparsify' can increase available space.
> > However, I believe there's room for improvement in the retype
> > implementation.
> > Currently, retyping between Ceph backends causes zero-filling, which
> > unnecessarily consumes time and storage space.
> > This is likely to happen frequently, for example, with retypes between
> > Ceph's HDD and SSD backends, and the Ceph administrator would need to run
> > 'rbd sparsify' frequently.
> > Are there any improvements to the retype implementation being considered
> > in the community?
> > I would also appreciate hearing your opinion on the need for improvement.
> > Best regards,
> >