Issue resizing volumes attached to running volume backed instances

Gorka Eguileor geguileo at redhat.com
Thu Jan 12 09:59:10 UTC 2023


On 08/12, Jérôme BECOT wrote:
> Hello Openstack,
>
> We have Ussuri deployed on a few clouds, and they're all plugged to
> PureStorage Arrays. We allow users to only use volumes for their servers. It
> means that each server disk is a LUN attached over ISCSI (with multipath) on
> the compute node hosting the server. Everything works quite fine, but we
> have a weird issue when extending volumes attached to running instances. The
> guests notice the new disk size .. of the last extent.
>
> Say I have a server with a 10gb disk. I add 5gb. On the guest, still 10gb. I
> add another 5gb, and on the guest I get 15, and so on. I've turned the debug
> mode on and I could see no error in the log. Looking closer at the log I
> could catch the culprit:
>
> 2022-12-08 17:35:13.998 46195 DEBUG os_brick.initiator.linuxscsi [] Starting
> size: *76235669504*
> 2022-12-08 17:35:14.028 46195 DEBUG os_brick.initiator.linuxscsi [] volume
> size after scsi device rescan *80530636800* extend_volume
> 2022-12-08 17:35:14.035 46195 DEBUG os_brick.initiator.linuxscsi [] Volume
> device info = {'device': '/dev/disk/by-path/ip-1...1:3260-iscsi-iqn.2010-06.com.purestorage:flasharray.x-lun-10',
> 'host': '5', 'channel': '0', 'id': '0', 'lun': '10'} extend_volume
> 2022-12-08 17:35:14.348 46195 INFO os_brick.initiator.linuxscsi [] Find
> Multipath device file for volume WWN 3624...
> 2022-12-08 17:35:14.349 46195 DEBUG os_brick.initiator.linuxscsi [] Checking
> to see if /dev/disk/by-id/dm-uuid-mpath-3624.. exists yet. wait_for_path
> 2022-12-08 17:35:14.349 46195 DEBUG os_brick.initiator.linuxscsi []
> /dev/disk/by-id/dm-uuid-mpath-3624... has shown up. wait_for_path
> 2022-12-08 17:35:14.382 46195 INFO os_brick.initiator.linuxscsi []
> mpath(/dev/disk/by-id/dm-uuid-mpath-3624) *current size 76235669504*
> 2022-12-08 17:35:14.412 46195 INFO os_brick.initiator.linuxscsi []
> mpath(/dev/disk/by-id/dm-uuid-mpath-3624) *new size 76235669504*
> 2022-12-08 17:35:14.413 46195 DEBUG oslo_concurrency.lockutils [] Lock
> "extend_volume" released by
> "os_brick.initiator.connectors.iscsi.ISCSIConnector.extend_volume" :: held
> 2.062s inner 2022-12-08 17:35:14.459 46195 DEBUG
> os_brick.initiator.connectors.iscsi [] <== extend_volume: return (2217ms)
> *76235669504* trace_logging_wrapper
> 2022-12-08 17:35:14.461 46195 DEBUG nova.virt.libvirt.volume.iscsi [] Extend
> iSCSI Volume /dev/dm-28; new_size=*76235669504* extend_volume
> 2022-12-08 17:35:14.462 46195 DEBUG nova.virt.libvirt.driver [] Resizing
> target device /dev/dm-28 to *76235669504* _resize_attached_volume
>
> The logs clearly shows that the rescan confirm the new size but when
> interrogating multipath, it does not. But requesting multipath few seconds
> after on the command line shows the new size as well. It explains the
> behaviour.
>
> I'm running Ubuntu 18.04 with multipath 0.7.4-2ubuntu3.2. The os-brick code
> for multipath is far more basic than the one in master branch. Maybe the
> multipath version installed is too recent for os-brick.
>
> Thanks for the help
>
> Jerome
>

Hi Jérôme,

As far as I can see this is a problem with the speed in which things
run.  The speed at which the extend happens in the backend and is
visible in the compute node is slower than the speed at which Nova asks
os-brick to check the new size.

There are multiple reasons why this could be happening:

- Pure extend is not synchronous: So Cinder tells Nova that it has
  extended the volume before it has actually happened in the backend. I
  doubt that is the case.

- The iSCSI notification of the new size to the compute node is slow.

- The Nova execution is too fast for the compute node to notice the
  change in the volume's size.  This is similar to the previous one, but
  in this case it's not that iSCSI is slow and it's a problem with the
  Network or the Storage Array, but it's that the compute node is too
  fast.

Regardless, in my opinion there is a Cinder-Nova-OS-brick change that
could be implemented to improve these situations. The extend_volume
method in os-brick could receive the expected new size, that way it can
actually wait a bit for the system to reflect it if it notices that it
hasn't increased yet.

Cheers,
Gorka.




More information about the openstack-discuss mailing list