On 08/12, Jérôme BECOT wrote:
Hello Openstack,
We have Ussuri deployed on a few clouds, and they're all plugged to PureStorage Arrays. We allow users to only use volumes for their servers. It means that each server disk is a LUN attached over ISCSI (with multipath) on the compute node hosting the server. Everything works quite fine, but we have a weird issue when extending volumes attached to running instances. The guests notice the new disk size .. of the last extent.
Say I have a server with a 10gb disk. I add 5gb. On the guest, still 10gb. I add another 5gb, and on the guest I get 15, and so on. I've turned the debug mode on and I could see no error in the log. Looking closer at the log I could catch the culprit:
2022-12-08 17:35:13.998 46195 DEBUG os_brick.initiator.linuxscsi [] Starting size: *76235669504* 2022-12-08 17:35:14.028 46195 DEBUG os_brick.initiator.linuxscsi [] volume size after scsi device rescan *80530636800* extend_volume 2022-12-08 17:35:14.035 46195 DEBUG os_brick.initiator.linuxscsi [] Volume device info = {'device': '/dev/disk/by-path/ip-1...1:3260-iscsi-iqn.2010-06.com.purestorage:flasharray.x-lun-10', 'host': '5', 'channel': '0', 'id': '0', 'lun': '10'} extend_volume 2022-12-08 17:35:14.348 46195 INFO os_brick.initiator.linuxscsi [] Find Multipath device file for volume WWN 3624... 2022-12-08 17:35:14.349 46195 DEBUG os_brick.initiator.linuxscsi [] Checking to see if /dev/disk/by-id/dm-uuid-mpath-3624.. exists yet. wait_for_path 2022-12-08 17:35:14.349 46195 DEBUG os_brick.initiator.linuxscsi [] /dev/disk/by-id/dm-uuid-mpath-3624... has shown up. wait_for_path 2022-12-08 17:35:14.382 46195 INFO os_brick.initiator.linuxscsi [] mpath(/dev/disk/by-id/dm-uuid-mpath-3624) *current size 76235669504* 2022-12-08 17:35:14.412 46195 INFO os_brick.initiator.linuxscsi [] mpath(/dev/disk/by-id/dm-uuid-mpath-3624) *new size 76235669504* 2022-12-08 17:35:14.413 46195 DEBUG oslo_concurrency.lockutils [] Lock "extend_volume" released by "os_brick.initiator.connectors.iscsi.ISCSIConnector.extend_volume" :: held 2.062s inner 2022-12-08 17:35:14.459 46195 DEBUG os_brick.initiator.connectors.iscsi [] <== extend_volume: return (2217ms) *76235669504* trace_logging_wrapper 2022-12-08 17:35:14.461 46195 DEBUG nova.virt.libvirt.volume.iscsi [] Extend iSCSI Volume /dev/dm-28; new_size=*76235669504* extend_volume 2022-12-08 17:35:14.462 46195 DEBUG nova.virt.libvirt.driver [] Resizing target device /dev/dm-28 to *76235669504* _resize_attached_volume
The logs clearly shows that the rescan confirm the new size but when interrogating multipath, it does not. But requesting multipath few seconds after on the command line shows the new size as well. It explains the behaviour.
I'm running Ubuntu 18.04 with multipath 0.7.4-2ubuntu3.2. The os-brick code for multipath is far more basic than the one in master branch. Maybe the multipath version installed is too recent for os-brick.
Thanks for the help
Jerome
Hi Jérôme, As far as I can see this is a problem with the speed in which things run. The speed at which the extend happens in the backend and is visible in the compute node is slower than the speed at which Nova asks os-brick to check the new size. There are multiple reasons why this could be happening: - Pure extend is not synchronous: So Cinder tells Nova that it has extended the volume before it has actually happened in the backend. I doubt that is the case. - The iSCSI notification of the new size to the compute node is slow. - The Nova execution is too fast for the compute node to notice the change in the volume's size. This is similar to the previous one, but in this case it's not that iSCSI is slow and it's a problem with the Network or the Storage Array, but it's that the compute node is too fast. Regardless, in my opinion there is a Cinder-Nova-OS-brick change that could be implemented to improve these situations. The extend_volume method in os-brick could receive the expected new size, that way it can actually wait a bit for the system to reflect it if it notices that it hasn't increased yet. Cheers, Gorka.