[glance] Slow image download when using glanceclient
Hi glance experts, I'm using the following code to download a glance image: ``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ``` And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster. In the same environment, when I use the glance CLI instead: ``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes. Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... Regards, Lucio
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk
Regards, Lucio
Thanks Sean, that makes much easier to code! ``` ... conn = openstack.connect(cloud_name) with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ``` And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image. Is there some tuning that I could apply? On Thu, Oct 13, 2022, 14:18 Sean Mooney <smooney@redhat.com> wrote:
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk
Regards, Lucio
Thanks Sean, that makes much easier to code!
``` ... conn = openstack.connect(cloud_name)
with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ```
And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image.
Is there some tuning that I could apply?
On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: this is what nova does https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 we get the image chunks by calling the data method on the glance client https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... then bwe basiclly just loop over the chunks and write them to a file like you are https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... we have some extra code for doing image verification but its basically the same as what you are doing we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed. this is the glance client method we are invokeing https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99e... im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? does the speed increase if you remvoe that? i.ie can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size? assuming its https://github.com/tqdm/tqdm perhaps the addtional io that woudl be doing to standard out is slowign it down?
On Thu, Oct 13, 2022, 14:18 Sean Mooney <smooney@redhat.com> wrote:
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk
Regards, Lucio
Yes, I'm using tqdm to monitor the progress and speed. I removed it, and it improved slightly (120kB/s -> 131kB/s) but not significantly :-/ On Thu, Oct 13, 2022, 16:54 Sean Mooney <smooney@redhat.com> wrote:
Thanks Sean, that makes much easier to code!
``` ... conn = openstack.connect(cloud_name)
with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ```
And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image.
Is there some tuning that I could apply?
On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: this is what nova does https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344
we get the image chunks by calling the data method on the glance client
https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... then bwe basiclly just loop over the chunks and write them to a file like you are
https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... we have some extra code for doing image verification but its basically the same as what you are doing we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed.
this is the glance client method we are invokeing
https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99e...
im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? does the speed increase if you remvoe that? i.ie can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size?
assuming its https://github.com/tqdm/tqdm perhaps the addtional io that woudl be doing to standard out is slowign it down?
On Thu, Oct 13, 2022, 14:18 Sean Mooney <smooney@redhat.com> wrote:
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few
minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download',
...])
if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk
Regards, Lucio
``` import openstack conn = openstack.connect() conn.image.download_image(image_name, stream=True, output="data.iso”) ``` This gives me max performance of the network. Actually using stream=True may be slower (around 40%), but may be crucially necessary when dealing with huge images. Additionally you can specify chunk_size as param to download_image function, what aligns performance of stream vs non stream (for me stream=True and chunk_size=8192 resulted 2.3G image to be downloaded in 14 sec)
On 13. Oct 2022, at 23:24, Lucio Seki <lucioseki@gmail.com> wrote:
Yes, I'm using tqdm to monitor the progress and speed. I removed it, and it improved slightly (120kB/s -> 131kB/s) but not significantly :-/
Thanks Sean, that makes much easier to code!
``` ... conn = openstack.connect(cloud_name)
with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ```
And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image.
Is there some tuning that I could apply?
On Thu, Oct 13, 2022, 16:54 Sean Mooney <smooney@redhat.com <mailto:smooney@redhat.com>> wrote: On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: this is what nova does https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344 <https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344>
we get the image chunks by calling the data method on the glance client https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... <https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L373-L377> then bwe basiclly just loop over the chunks and write them to a file like you are https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... <https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e284/nova/image/glance.py#L413-L437> we have some extra code for doing image verification but its basically the same as what you are doing we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed.
this is the glance client method we are invokeing https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99e... <https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99eeb535a650b0495925d/glanceclient/v2/images.py#L201-L271>
im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? does the speed increase if you remvoe that? i.ie <http://i.ie/> can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size?
assuming its https://github.com/tqdm/tqdm <https://github.com/tqdm/tqdm> perhaps the addtional io that woudl be doing to standard out is slowign it down?
On Thu, Oct 13, 2022, 14:18 Sean Mooney <smooney@redhat.com <mailto:smooney@redhat.com>> wrote:
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download', ...]) if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk <https://github.com/openstack/openstacksdk>
Regards, Lucio
Thanks Artem, Indeed, using the `output` parameter increased the download speed from 120KB/s to >120MB/s (the max network performance I have). That's great! I'll look into the method definition and see what's the secret. Regards, Lucio On Fri, Oct 14, 2022, 12:07 Artem Goncharov <artem.goncharov@gmail.com> wrote:
``` import openstack
conn = openstack.connect()
conn.image.download_image(image_name, stream=True, output="data.iso”) ```
This gives me max performance of the network. Actually using stream=True may be slower (around 40%), but may be crucially necessary when dealing with huge images. Additionally you can specify chunk_size as param to download_image function, what aligns performance of stream vs non stream (for me stream=True and chunk_size=8192 resulted 2.3G image to be downloaded in 14 sec)
On 13. Oct 2022, at 23:24, Lucio Seki <lucioseki@gmail.com> wrote:
Yes, I'm using tqdm to monitor the progress and speed. I removed it, and it improved slightly (120kB/s -> 131kB/s) but not significantly :-/
On Thu, Oct 13, 2022, 16:54 Sean Mooney <smooney@redhat.com> wrote:
Thanks Sean, that makes much easier to code!
``` ... conn = openstack.connect(cloud_name)
with open(path, 'wb') as image_file: response = conn.image.download_image(image_name) for chunk in tqdm(response.iter_content(), **tqdm_params): image_file.write(chunk) ```
And it gave me some performance improvement (3kB/s -> 120kB/s). ... though it would still take several days to download an image.
Is there some tuning that I could apply?
On Thu, 2022-10-13 at 16:21 -0300, Lucio Seki wrote: this is what nova does https://github.com/openstack/nova/blob/master/nova/image/glance.py#L344
we get the image chunks by calling the data method on the glance client
https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... then bwe basiclly just loop over the chunks and write them to a file like you are
https://github.com/openstack/nova/blob/03d2715ed492350fa11908aea0fdd0265993e... we have some extra code for doing image verification but its basically the same as what you are doing we use eventlets to monkeypatch python io which can imporve performce but i woudl not expect it to be that dramatic and i dont think the glance clinet or opesntack client use eventlet so its sound liek something else is limiting the transfer speed.
this is the glance client method we are invokeing
https://github.com/openstack/python-glanceclient/blob/56186d6d5aa1a0c8fde99e...
im not sure what tqdm is by the way is it meusrign the transfer speed of something linke that? does the speed increase if you remvoe that? i.ie can you test this via a simple time script and see how much downloads say in up to 60 seconds by lookign at the file size?
assuming its https://github.com/tqdm/tqdm perhaps the addtional io that woudl be doing to standard out is slowign it down?
On Thu, Oct 13, 2022, 14:18 Sean Mooney <smooney@redhat.com> wrote:
On Thu, 2022-10-13 at 13:30 -0300, Lucio Seki wrote:
Hi glance experts,
I'm using the following code to download a glance image:
``` from glanceapi import client ... glance = client.Client(GLANCE_API_VERSION, session=sess) ... with open(path, 'wb') as image_file: data = glance.images.data(image_id) for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): image_file.write(chunk) ```
And I get a speed around 3kB/s. It would take months to download an image. I'm using python3-glanceclient==3.6.0. I even tried: ``` for chunk in tqdm(data, unit='B', unit_scale=True, unit_divisor=1024): pass ``` to see if the bottleneck was the disk I/O, but didn't get any
faster.
In the same environment, when I use the glance CLI instead:
``` glance image-download --file $path $image_id ``` I get hundreds of MB/s download speed, and it finishes in a few
minutes.
Is there anything I can do to improve the glanceclient performance? I'm considering using subprocess.Popen(['glance', 'image-download',
...])
if nothing helps... have you considered using the openstacksdk instead
the glanceclint is really only intendeted for other openstack service to use like nova or ironic. its not really ment to be used to write your onw code anymore. in the past it provided a programatic interface for interacting with glance but now you shoudl prefer the openstack sdk instead. https://github.com/openstack/openstacksdk
Regards, Lucio
participants (3)
-
Artem Goncharov
-
Lucio Seki
-
Sean Mooney