hi thanks for the report can we get the detail into a launchpad bug?
you can file the single bug for nova and we can triage it fully but ill
reply inlien as well
On 31/10/2025 08:51, Vladimir Prokofev wrote:
> Hello everyone.
>
> I've discovered a bug in nova/glanceclient interoperation that leads
> to image corruption during shelving under some circumstances involving
> connection disruption.
>
> I was able to reproduce it in two different production environments
> running on xena and 2024.2 respectively, I'm also able to reproduce it
> in devstack though without image corruption - there it just ends up in
> a shelve failure due to detected corruption.
>
> My setup assumes LVM-backed QEMU VMs with backend in CEPH for Glance,
> but I believe this is applicable to a variety of nova/glance backends.
>
> When shelve is triggered, nova-compute creates an image file locally,
> and then initiates upload of said image file into Glance[0]. If
> something happens to the connection during the upload("broken pipe",
> "connection timeout") - nova-compute retries upload operation[1] while
> image object is removed from CEPH backend by Glance[2]
> Problem here is that the image_file object that is passed to
> glanceclient by nova-compute is a byte-stream created with an
> open()[3] call, and upon retry it resumes the upload from the point
> where it was interrupted. This is easily confirmed by calling
> image_data.tell() in glanceclient.v2.images.Controller.upload()
> function - it will be at zero initially and non-zero on retry.
so looking at the wapper object fixing this in nova will be kidn of anoying.
the fix woudl be to just seek the byte stream back to the sart of the file.
however the retry is doen dynamiclly today via the overloaded call
fucntion which does not currently have awareness of which method is
being invoked and as a result does not reset the stream.
if the retyr logic was scoped to the upload function say here
https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L587-L594
it woudl be cleaner to fix
that does not mean we cant do something like check the method that is
passed here
https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L181
and do some processing on the args
or modify
https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L3268-L3274 or
https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L682-L706
to be a closure/wrapped in a decorator that woudl
infact the simplest fix might be to add a finally block here
https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L705
that just seeks the data back to 0
that way it undoes the internal modifiction to the stream postion.
an alternive woudl be to fix this in glance client but not galce itself.
https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v2/images.py#L291-L314
we coudl modify the upload funciton to first seek the image data to
localtion 0 and then proceed with the upload.
the api contract of the method say that the image data is a file like obejct
https://docs.python.org/3/glossary.html#term-file-like-object that just
says a file like object is a a synonym for
https://docs.python.org/3/glossary.html#term-file-object
```
An object exposing a file-oriented API (with methods such as
|read()| or |write()|) to an underlying resource. Depending on the
way it was created, a file object can mediate access to a real
on-disk file or to another type of storage or communication device
(for example standard input/output, in-memory buffers, sockets,
pipes, etc.). File objects are also called /file-like objects/ or
/streams/.
There are actually three categories of file objects: raw binary
files <https://docs.python.org/3/glossary.html#term-binary-file>,
buffered binary files
<https://docs.python.org/3/glossary.html#term-binary-file> and text
files <https://docs.python.org/3/glossary.html#term-text-file>.
Their interfaces are defined in the |io|
<https://docs.python.org/3/library/io.html#module-io> module. The
canonical way to create a file object is by using the |open()|
<https://docs.python.org/3/library/functions.html#open> function.
```
ok so if we look at
https://docs.python.org/3/library/io.html#class-hierarchy what does the
interface require
the base class of the interface is
https://docs.python.org/3/library/io.html#io.IOBase
it provide "|fileno|, |seek|, and |truncate" as stubs which later
calsses must impelnt and "close|, |closed|, |__enter__|, |__exit__|,
|flush|, |isatty|, |__iter__|, |__next__|, |readable|, |readline|,
|readlines|, |seekable|, |tell|, |writable|, and |writelines" as mixin
methods.|
so since https://docs.python.org/3/library/io.html#io.IOBase.seek is
required for seekable stream we can us it to reset the stream we pass to
the beginging
now does the glance client requrie that you pass a file like object that
supprots random access in its api contract? technially no.
so it woudl be more correct for nova to do the resetting then the glance
client as we as the application can enforece the stlightly stricter
requiremnt
without narrowing the api contact
glance client in teh future could narrow the contract and sue
https://docs.python.org/3/library/io.html#io.IOBase.seekable
and implement the resetting of the stream if its not at postion 0 and
raises an excption if its not seakable but i dont think that is correct
i think this should be fixed as a nova bug.
>
> I haven't thoroughly checked protection offered in Glance master, I
> believe it may have been significantly improved, but in older releases
> such as 2024.2, this leads to a corrupted shelved image, because upon
> retry only part of an image object is uploaded, after which original
> VM is offloaded(removed) and you end up with lost data.
>
> Now, my issue here is where to submit this bug: glanceclient or nova?
> This problem is easily fixable in glanceclient by calling
> image_data.seek(0) in glanceclient.v2.images.Controller.upload(): it
> makes sense to always point byte-stream to the beginning before
> initiating upload, but should it really be responsibility of a client
> to perform such a sanity check? I'm also not sure if there're cases
> that this will break, for example if a non-seekable object is passed
> into glanceclient, but I'm not sure if this is even possible?
>
> [0]
> https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L3268-L3274
> [1]
> https://opendev.org/openstack/nova/src/branch/master/nova/image/glance.py#L187-L201
> [2]
> https://opendev.org/openstack/glance_store/src/branch/master/glance_store/_drivers/rbd.py#L663-L678
> [3]
> https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L3268