Hi Shrishail,

On Wed, 31 Jan 2024 at 8:49 PM, Shrishail K <nkshrishail@gmail.com> wrote:
Thanks folks for your responses.

I have a couple of follow up questions/clarifications to quickly fix/hack this issue in my local environment.

1. Can I manually copy any images larger than 5GB to s3 bucket before running image-import command?
    Would image-import see the file in the S3 and use it instead of trying to copy it all over again?

I think (not sure it will work for s3 or not), after manually copying image to s3 you can use add-location command to add location to the existing image. Image import will not work in this case.


2. Following the discussion, I was getting a sense that the code changes may not be a lot, in that case, if it's possible to provide a patch,
    I can patch my openstack and see how it goes. 

Even though changes are trivial, glance uses onion architecture, so it needed changes at a lot places which will take some time. Ideally it’s not possible to provide quick fix for this.

Thanks,

Abhishek 

Thanks,
Shrishail

On Wed, 31 Jan 2024 at 05:28, Abhishek Kekane <akekane@redhat.com> wrote:



On Wed, 31 Jan 2024 at 3:49 PM, Christian Rohmann <christian.rohmann@inovex.de> wrote:
Hey Abhishek!

On 31.01.24 10:35, Abhishek Kekane wrote:
On 31.01.24 09:13, Abhishek Kekane wrote:
By design copy-image import workflow uses a common uploading mechanism for all stores, so yes it is a known limitation if it is not using multipart upload for s3 backend. Feel free to propose enhancement for the same or participate in the upcoming PTG 'April 8-12, 2024' to discuss the improvements for this behavior.
Abishek, I suppose the copy-image is done via this helper here and which are were referring to?
https://github.com/openstack/glance/blob/master/glance/async_/flows/_internal_plugins/copy_image.py

Hi Christian,

The helper you mention above is responsible to download the existing data at common storage known as staging area (configured using os_glance_staging_store in glance-api.conf) and from there it will be imported to the destination/target store. However debugging further I found that it internally calls store.add method, which means in fact it is using a particular driver call only.

I suspect [1] is where it is using single_part as an upload for s3 while copying the image, because we are not passing the size of an existing image to the import call.

I think this is driver specific improvement, and requires additional effort to make it work.

I cannot (quickly) follow your debugging / the calls you mentioned.
Could you please raise a bug with your findings to "fix" this?  Seems like this is not intended behavior?

Here the image size is actually provided when the image is fetched to the staging store: https://github.com/openstack/glance/blob/b6b9f043ffe664c643456912148648ecc0d6c9b4/glance/async_/flows/_internal_plugins/copy_image.py#L122


Hey Christian,

The store you mentioned here is staging store which is a filesystem store and not intended (s3) store, from here after the image import flow will get called which will give call to upload the data from file (staging) store to actual store. You will find it in a method set_image_data from glance/async_/flows/api_image_import.py file.

Abhishek 


But what is the next step then to upload the "staged" image into the new target store?

In any case, I tend to also disagree that, if missing image_size is the issue, providing it to the add call is a S3 driver specific thing.
Other object storages (GCS, Azure Blob, ...) might "like" to know the size as well to adjust their upload strategy.



Regards


Christian