[ironic] [tripleo] RFC: lzma vs gzip for compressing IPA initramfs
Hi folks, I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing). What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me. So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time). Feedback is very welcome. Dmitry -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Hi, LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO. чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- Sergii Golovatiuk Senior Software Developer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>
Hi, On Fri, Nov 27, 2020 at 2:00 PM Sergii Golovatiuk <sgolovat@redhat.com> wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
LZMA packing won't happen on ironic nodes. The images are pre-built and only unpacked on the target machines. Or does TripleO build images every time? Dmitry
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- Sergii Golovatiuk
Senior Software Developer
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On Fri, Nov 27, 2020 at 3:16 PM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi,
On Fri, Nov 27, 2020 at 2:00 PM Sergii Golovatiuk <sgolovat@redhat.com> wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
+1 to make it optional. Five or even ten minutes (per the testing by Arne++ earlier in this thread) is a long time for some of the tripleo jobs which are running close to the 3 hour limit; for example we had to make some adjustment [1] recently for the standalone-upgrade job because of timeouts.
LZMA packing won't happen on ironic nodes. The images are pre-built and only unpacked on the target machines.
Or does TripleO build images every time?
No the CI jobs use images pulled from the ipa_image_url defined in [2] (that is for centos8 master, there are equivalent release files & image_url defined for the stable/*). Those are put there by the periodic buildimage-* jobs e.g. [3]. thanks, marios [1] https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/762674 [2] https://opendev.org/openstack/tripleo-quickstart/src/commit/fd092aa4b6a90238... [3] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-centos-8-buildimage-ironic-python-agent-master&job_name=periodic-tripleo-centos-8-buildimage-overcloud-full-master
Dmitry
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- Sergii Golovatiuk
Senior Software Developer
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Hi, I did a quick test (one data point): - the image build time increased by 10 mins (on a VM, this is more than double compared to gzip) - but: the resulting image size is ~30% smaller (421 vs 297 MB) - the cleaning time (unpacking on bare metal!) increased by ~30 seconds So, lzma looks like a good option to reduce the image size which we had to do in our deployment already to address boot issues (we removed some packages). Keeping gzip as the default and offering lzma as an option to start with as suggested by Sergii seems like a good way forward. I also think it would be good to have someone else test as well to have another data point :-) Cheers, Arne On 27.11.20 14:00, Sergii Golovatiuk wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com <mailto:dtantsur@redhat.com>>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- SergiiGolovatiuk
Senior Software Developer
Red Hat <https://www.redhat.com/>
On Fri, 2020-11-27 at 18:35 +0100, Arne Wiebalck wrote:
Hi,
I did a quick test (one data point):
- the image build time increased by 10 mins (on a VM, this is more than double compared to gzip) - but: the resulting image size is ~30% smaller (421 vs 297 MB) - the cleaning time (unpacking on bare metal!) increased by ~30 seconds
that is one of the main trade offs the lzma compression makes it front loads the computeational load to the comppression step requireing both more ram and time to do the initall compression while also achiviving a better compresison ratio while allowing light weight clients to decompress it without similar increase the time/ram requirement for the client. you may also want to consider the .xz format lzma and lzma2 have now been supperceed by .xz https://en.wikipedia.org/wiki/XZ_Utils .xz format is becomming more an d more commen fro things liek compress kernel modules and packages.
So, lzma looks like a good option to reduce the image size which we had to do in our deployment already to address boot issues (we removed some packages).
Keeping gzip as the default and offering lzma as an option to start with as suggested by Sergii seems like a good way forward.
I also think it would be good to have someone else test as well to have another data point :-)
Cheers, Arne
On 27.11.20 14:00, Sergii Golovatiuk wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com <mailto:dtantsur@redhat.com>>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- SergiiGolovatiuk
Senior Software Developer
Red Hat <https://www.redhat.com/>
On Fri, Nov 27, 2020 at 7:46 PM Sean Mooney <smooney@redhat.com> wrote:
On Fri, 2020-11-27 at 18:35 +0100, Arne Wiebalck wrote:
Hi,
I did a quick test (one data point):
- the image build time increased by 10 mins (on a VM, this is more than double compared to gzip) - but: the resulting image size is ~30% smaller (421 vs 297 MB) - the cleaning time (unpacking on bare metal!) increased by ~30 seconds
that is one of the main trade offs the lzma compression makes it front loads the computeational load to the comppression step requireing both more ram and time to do the initall compression while also achiviving a better compresison ratio while allowing light weight clients to decompress it without similar increase the time/ram requirement for the client. you may also want to consider the .xz format
Is it wildly supported for initramfs compression? I can, of course, try, but maybe you know. Dmitry
lzma and lzma2 have now been supperceed by .xz https://en.wikipedia.org/wiki/XZ_Utils
.xz format is becomming more an d more commen fro things liek compress kernel modules and packages.
So, lzma looks like a good option to reduce the image size which we had to do in our deployment already to address boot issues (we removed some packages).
Keeping gzip as the default and offering lzma as an option to start with as suggested by Sergii seems like a good way forward.
I also think it would be good to have someone else test as well to have another data point :-)
Cheers, Arne
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com <mailto:dtantsur@redhat.com>>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal,
On 27.11.20 14:00, Sergii Golovatiuk wrote: please
feel free to test
https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371
and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- SergiiGolovatiuk
Senior Software Developer
Red Hat <https://www.redhat.com/>
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 2020-11-28 13:32:09 +0100 (+0100), Dmitry Tantsur wrote:
Is it wildly supported for initramfs compression? I can, of course, try, but maybe you know. [...]
It looks like you need your kernel built with with: CONFIG_HAVE_KERNEL_XZ=y CONFIG_KERNEL_XZ=y CONFIG_RD_XZ=y CONFIG_XZ_DEC=y CONFIG_XZ_DEC_X86=y (or relevant CPU architecture) CONFIG_DECOMPRESS_XZ=y The standard Linux 5.9 kernel on my Debian systems has the above enabled by default, not sure about other distros or older kernels though CONFIG_KERNEL_XZ has been included as far back as Linux 2.38: https://cateee.net/lkddb/web-lkddb/KERNEL_XZ.html Apparently you may need to pass --check=crc32 and (possibly --lzma2=dict=512KiB as well) to xz when compressing your initrd, to accommodate the kernel's decompressor implementation: https://github.com/linuxboot/book/blob/master/coreboot.u-root.systemboot/REA... https://www.linuxquestions.org/questions/slackware-installation-40/booting-w... -- Jeremy Stanley
On Fri, 27 Nov 2020 at 17:37, Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi,
I did a quick test (one data point):
- the image build time increased by 10 mins (on a VM, this is more than double compared to gzip) - but: the resulting image size is ~30% smaller (421 vs 297 MB) - the cleaning time (unpacking on bare metal!) increased by ~30 seconds
30 seconds is a long time, even for bare metal. 134 MB is roughly to a gigabit of network traffic. So really this comes down to how much spare capacity you have in your network to handle bursts of these downloads.
So, lzma looks like a good option to reduce the image size which we had to do in our deployment already to address boot issues (we removed some packages).
Keeping gzip as the default and offering lzma as an option to start with as suggested by Sergii seems like a good way forward.
I also think it would be good to have someone else test as well to have another data point :-)
Cheers, Arne
On 27.11.20 14:00, Sergii Golovatiuk wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com <mailto:dtantsur@redhat.com>>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- SergiiGolovatiuk
Senior Software Developer
Red Hat <https://www.redhat.com/>
On 30.11.20 10:10, Mark Goddard wrote:
On Fri, 27 Nov 2020 at 17:37, Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi,
I did a quick test (one data point):
- the image build time increased by 10 mins (on a VM, this is more than double compared to gzip) - but: the resulting image size is ~30% smaller (421 vs 297 MB) - the cleaning time (unpacking on bare metal!) increased by ~30 seconds
30 seconds is a long time, even for bare metal. 134 MB is roughly to a gigabit of network traffic. So really this comes down to how much spare capacity you have in your network to handle bursts of these downloads.
From what I understand the main issue Dmitry is trying to address with this proposal is to reduce the risk of (UEFI) boot issues due to memory constraints on the target host.
So, lzma looks like a good option to reduce the image size which we had to do in our deployment already to address boot issues (we removed some packages).
Keeping gzip as the default and offering lzma as an option to start with as suggested by Sergii seems like a good way forward.
I also think it would be good to have someone else test as well to have another data point :-)
Cheers, Arne
On 27.11.20 14:00, Sergii Golovatiuk wrote:
Hi,
LZMA causes very high CPU and memory usage for creating images, leaving less resources for other processes. If Ironic is running alongside with other services that may cause significant impact for them. I would leave gzip option as default, would introduce --lzma as well as --gzip and use lzma on 5-10% of our CI resources to test how it goes. Then after a significant amount of testing we could turn it on as default. Proper deprecation should be applied here as well IMHO.
чт, 26 нояб. 2020 г. в 17:57, Dmitry Tantsur <dtantsur@redhat.com <mailto:dtantsur@redhat.com>>:
Hi folks,
I've been playing with ways to reduce the size of our IPA images. While package removals can only save us tens of megabytes, switching from gzip to lzma reduces the size by around a third (from 373M to 217M in my testing).
What's the caveat? The unpacking time increases VERY substantially. On my nested virt lab the 217M image took around 5 minutes to unpack. I'm not sure how much it will impact real bare metal, please feel free to test https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/764371 and tell me.
So, what do you think? Switching to lzma by default will likely affect CI run time (assuming we still have DIB jobs somewhere...) and development environments, but it will also provide a visible reduction in the image size (which benefit all environments). Large TripleO images may particularly benefit from this (but also particularly affected by the unpacking time).
Feedback is very welcome.
Dmitry
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- SergiiGolovatiuk
Senior Software Developer
Red Hat <https://www.redhat.com/>
participants (7)
-
Arne Wiebalck
-
Dmitry Tantsur
-
Jeremy Stanley
-
Marios Andreou
-
Mark Goddard
-
Sean Mooney
-
Sergii Golovatiuk