Hello everyone. I've discovered a bug in nova/glanceclient interoperation that leads to image corruption during shelving under some circumstances involving connection disruption. I was able to reproduce it in two different production environments running on xena and 2024.2 respectively, I'm also able to reproduce it in devstack though without image corruption - there it just ends up in a shelve failure due to detected corruption. My setup assumes LVM-backed QEMU VMs with backend in CEPH for Glance, but I believe this is applicable to a variety of nova/glance backends. When shelve is triggered, nova-compute creates an image file locally, and then initiates upload of said image file into Glance[0]. If something happens to the connection during the upload("broken pipe", "connection timeout") - nova-compute retries upload operation[1] while image object is removed from CEPH backend by Glance[2] Problem here is that the image_file object that is passed to glanceclient by nova-compute is a byte-stream created with an open()[3] call, and upon retry it resumes the upload from the point where it was interrupted. This is easily confirmed by calling image_data.tell() in glanceclient.v2.images.Controller.upload() function - it will be at zero initially and non-zero on retry. I haven't thoroughly checked protection offered in Glance master, I believe it may have been significantly improved, but in older releases such as 2024.2, this leads to a corrupted shelved image, because upon retry only part of an image object is uploaded, after which original VM is offloaded(removed) and you end up with lost data. Now, my issue here is where to submit this bug: glanceclient or nova? This problem is easily fixable in glanceclient by calling image_data.seek(0) in glanceclient.v2.images.Controller.upload(): it makes sense to always point byte-stream to the beginning before initiating upload, but should it really be responsibility of a client to perform such a sanity check? I'm also not sure if there're cases that this will break, for example if a non-seekable object is passed into glanceclient, but I'm not sure if this is even possible? [0] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/drive... [1] https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L1... [2] https://opendev.org/openstack/glance_store/src/branch/master/glance_store/_d... [3] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/drive...