[nova][ops] Deprecate AMI/direct kernel boot support
Hi all,
tl;dr: We really need ops feedback about any direct need to retain this feature.
Since the early days, Nova has supported the notion of "direct linux kernel boot" (with libvirt/kvm at least). This means each bootable image in Glance is actually three parts: a disk image, a ramdisk image, and a kernel. Booting the instance involves putting the kernel and ramdisk into memory in a special way that the kernel can immediately start booting instead of doing things like running the bootloader like a normal system does. This is present in Nova because of long-ago Amazon compatibility, as this is how Xen paravirt guests used to work for them. I think that in the early days of Nova, compatibility with the _actual_ images used on Amazon was important, but I think that is no longer something we need to care about today, just as we have deprecated EC2 API compatibility. It should also be noted that some of it is broken, and the recent CVE fixes have broken it like three times in the last 30 days. Continuing to support it makes the code more complex and fragile, and arguably less secure.
So, I'm asking for operator feedback on this question:
Do you need this feature, and if so, why?
Speak now or expect to see it deprecated for removal very soon.
Thanks!
--Dan
Hi,
TL;DR I don't need AMI support, but need direct kernel boot support for confidential computing use case.
Direct kernel boot is currently one of the popular solutions in confidential computing use case to measure boot elements like kernel. We actually used it internally in our initial PoC to support SEV-SNP in nova which I plan to contribute to upstream next cycle (after I complete the SEV-ES support work during this cycle).
There are some active deployment activity in this area to implement secured vTPM which allows measured boot without direct kernel boot, but this work is still under very active development.
I don't have the full understanding about the current issue, but I'm hoping that we can keep the direct kernel boot for some time until we get a better method acceptable for confidential computing use case.
Note that we do not use AMI images but just raw (or qcow2) image with raw kernel image and raw randisk image associated (by kernel-id property and ramdisk-id property) to use direct kernel boot.
Thank you, Takashi Kajinami
On 7/26/24 02:13, Dan Smith wrote:
Hi all,
tl;dr: We really need ops feedback about any direct need to retain this feature.
Since the early days, Nova has supported the notion of "direct linux kernel boot" (with libvirt/kvm at least). This means each bootable image in Glance is actually three parts: a disk image, a ramdisk image, and a kernel. Booting the instance involves putting the kernel and ramdisk into memory in a special way that the kernel can immediately start booting instead of doing things like running the bootloader like a normal system does. This is present in Nova because of long-ago Amazon compatibility, as this is how Xen paravirt guests used to work for them. I think that in the early days of Nova, compatibility with the _actual_ images used on Amazon was important, but I think that is no longer something we need to care about today, just as we have deprecated EC2 API compatibility. It should also be noted that some of it is broken, and the recent CVE fixes have broken it like three times in the last 30 days. Continuing to support it makes the code more complex and fragile, and arguably less secure.
So, I'm asking for operator feedback on this question:
Do you need this feature, and if so, why?
Speak now or expect to see it deprecated for removal very soon.
Thanks!
--Dan
On Fri, 2024-07-26 at 18:01 +0900, Takashi Kajinami wrote:
Hi,
TL;DR I don't need AMI support, but need direct kernel boot support for confidential computing use case.
Direct kernel boot is currently one of the popular solutions in confidential computing use case to measure boot elements like kernel. We actually used it internally in our initial PoC to support SEV-SNP in nova which I plan to contribute to upstream next cycle (after I complete the SEV-ES support work during this cycle).
There are some active deployment activity in this area to implement secured vTPM which allows measured boot without direct kernel boot, but this work is still under very active development.
hum we should discuss this future but im not sure if we should proceed with SEV-SNP enablement before that is completed.
with that said if the request for SEV-SNP is done via a trait on the image combined with the exisitng image property for memory encycpition it may be workable given the direct kernel boot functionality is also expressed on the image.
that woudl be a pretty big limitation however for that feature.
direct kernel boot obviosly has lifecycle implciationf for the guest i.e. you cant just do a dnf or apt update in existign instnaces. it forces you to rebuild to update the kernel for security fixes for example. that is fine if your applciations are stateless and cloud native where all data is storead on cidner volumes and the root image is just packageing the aplicaiton but for any enterpirse usecases that are less cloud aware it would basiclaly make the SEV-SNP feature unusable if it was a hard requirement.
lets take that discussion elsewhere however and keep this thread focused on dan's original question.
I don't have the full understanding about the current issue, but I'm hoping that we can keep the direct kernel boot for some time until we get a better method acceptable for confidential computing use case.
Note that we do not use AMI images but just raw (or qcow2) image with raw kernel image and raw randisk image associated (by kernel-id property and ramdisk-id property) to use direct kernel boot.
direct kernel boot is slightly easier to support then AMI
the short version is AMI is not really well defiend format and how its expressed in glance is problematic. none of the tools we use (qemu-img,file, the format inspector) actully recognise ami/aki/ari as they are not really file format they are just lables for the types of content conatined in a uec/ami "image" that makes validating very complex.
the long version is the uec/ami iamges are just a tar with a file contianing a root files system, a kernel and a ramdisk the issue is that each of those 3 embeded files can be in any format.
typically the root file system image is a copy of a block device with a partitaion table and a file system liek what you woudl get form "dd if=/dev/sda of=root.img".
[10:48:51]➜ file -z * cirros-0.6.2-x86_64-blank.img: Linux rev 1.0 ext3 filesystem data, UUID=f1511162-06fb-4482-9dab-9a0c76633fb2, volume name "cirros-rootfs" (large files) cirros-0.6.2-x86_64-initrd: ASCII cpio archive (SVR4 with no CRC) (gzip compressed data, max compression, from Unix) cirros-0.6.2-x86_64-vmlinuz: Linux kernel x86 boot executable bzImage, version 5.15.0-71-generic (buildd@lcy02-amd64-044) #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023, RO-rootFS, swap_dev 0XB, Normal VGA
the problem that creates for us is we cant really validate the content of these images for saftey as a result. at least not without coming to an agreement about what format are valid.
to that end in light of the recent CVEs we would like to start moving away form ami/ari/aki and moving towards glance having explcit disk formats that match reality.
im proxying some of the converstaion i have had with dan on this topic but for exmaple the root disk image cirros-0.6.2-x86_64-blank.img shoudl really be disk-format=GPT i.e. declaring to glance that this iamge contains a gpt partition table
to be bootable by qemu the initrd needs to be in cpio format and can optionally be compressed cirros-0.6.2-x86_64-blank.img
so instead of ari we shoudl be uploadign that as disk-format=cpio for the kernel we can also have a disk-format=linux-kernel instead of aki.
before doing that however we wanted to know if we need to maintain support for direct kernel boot or if this is a legacy functionality we coudl remove form our codebase.
what we want to move away form is treating anything we dont recognise as raw as that makes it very hard to properly harden ourselves form any future CVEs. the more strict we can be with the content of images in glance the more secure openstack will be going forward.
Thank you, Takashi Kajinami
On 7/26/24 02:13, Dan Smith wrote:
Hi all,
tl;dr: We really need ops feedback about any direct need to retain this feature.
Since the early days, Nova has supported the notion of "direct linux kernel boot" (with libvirt/kvm at least). This means each bootable image in Glance is actually three parts: a disk image, a ramdisk image, and a kernel. Booting the instance involves putting the kernel and ramdisk into memory in a special way that the kernel can immediately start booting instead of doing things like running the bootloader like a normal system does. This is present in Nova because of long-ago Amazon compatibility, as this is how Xen paravirt guests used to work for them. I think that in the early days of Nova, compatibility with the _actual_ images used on Amazon was important, but I think that is no longer something we need to care about today, just as we have deprecated EC2 API compatibility. It should also be noted that some of it is broken, and the recent CVE fixes have broken it like three times in the last 30 days. Continuing to support it makes the code more complex and fragile, and arguably less secure.
So, I'm asking for operator feedback on this question:
Do you need this feature, and if so, why?
Speak now or expect to see it deprecated for removal very soon.
Thanks!
--Dan
There are some active deployment activity in this area to implement secured vTPM which allows measured boot without direct kernel boot, but this work is still under very active development.
hum we should discuss this future but im not sure if we should proceed with SEV-SNP enablement before that is completed.
with that said if the request for SEV-SNP is done via a trait on the image combined with the exisitng image property for memory encycpition it may be workable given the direct kernel boot functionality is also expressed on the image.
that woudl be a pretty big limitation however for that feature.
Agreed, and I certainly would be hesitant to add more stuff dependent on something I'm hoping to remove. The UX for a feature where the kernel and ramdisk updated in the guest by the distro isn't actually what gets used it not a very good workflow at all.
That said, I guess honoring the kernel/ramdisk linkage from whatever image is selected is perhaps something we could do with less complication than we currently have. Right now, we have places where booting from an AMI changes various behaviors and that's definitely the primary thing I want to remove. Just honoring the kernel/ramdisk linkage without other special image behaviors is maybe (*maybe*) less concerning although I still think it would be better to eliminate that if we can.
im proxying some of the converstaion i have had with dan on this topic but for exmaple the root disk image cirros-0.6.2-x86_64-blank.img shoudl really be disk-format=GPT i.e. declaring to glance that this iamge contains a gpt partition table
I definitely don't want to add cpio (and etc) to glance as disk_format options just because the kernel or ramdisk image may be encoded that way. However, if we can get to the point where nova will boot a disk_format=gpt but *not* a disk_format=raw, then raw can become "a non-bootable binary blob used for other purposes" ... which could be a kernel, ramdisk, or anything else.
--Dan
On 7/27/24 00:03, Dan Smith wrote:
There are some active deployment activity in this area to implement secured vTPM which allows measured boot without direct kernel boot, but this work is still under very active development.
hum we should discuss this future but im not sure if we should proceed with SEV-SNP enablement before that is completed.
with that said if the request for SEV-SNP is done via a trait on the image combined with the exisitng image property for memory encycpition it may be workable given the direct kernel boot functionality is also expressed on the image.
that woudl be a pretty big limitation however for that feature.
Agreed, and I certainly would be hesitant to add more stuff dependent on something I'm hoping to remove. The UX for a feature where the kernel and ramdisk updated in the guest by the distro isn't actually what gets used it not a very good workflow at all.
Direct kernel boot is not mandatory to use SEV-SNP. Measured boot by direct kernel boot is an optional feature for users who want very strict attestation of the sofrware in their VM, so we should probably discuss it separately from the base SEV-SNP support.
However we've had discussions about potential use cases of SEV-SNP internally and have learned that there are actually some use cases where strict attestation is required with the tradeoff in UX. I understand the limitation and tricky UX of the feature (I learned these while I did some PoC work) but these would be still acceptable for users who has very strict requirement to protect their data, including their applications or software, on cloud.
As I said there are some active development works in this area, but these are still in early phase and it may take some time (at least one year or even a few years) until these are implemented in upstream kernel and a few others. Also at this stage it's not clear how much additional work we need to actually integrate these works to cloud use case.
So I really hope that we can start with the not-the-best but working solution first to realize strict attestations and then consider replacing it by a better functionality in a future.
That said, I guess honoring the kernel/ramdisk linkage from whatever image is selected is perhaps something we could do with less complication than we currently have. Right now, we have places where booting from an AMI changes various behaviors and that's definitely the primary thing I want to remove. Just honoring the kernel/ramdisk linkage without other special image behaviors is maybe (*maybe*) less concerning although I still think it would be better to eliminate that if we can.
Disclaimer: I'm not a security expert or a QEMU expert so it'd be nice to hear opinion from someone more familiar with these
My understanding that the image format CVE we are mainly discussing here is caused by the way how QEMU (specifically qemu-img) handles image format.
However for kernel image and ramdisk image these are treated as pure "raw" image without any conversion or parsing method in both nova layer as well as QEMU layer (when these are associated with kernel/ramdisk_id, not by AMI), so it may have relatively lower risks to keep the current handling of these images
My current view after quick dig into the QEMU implementation is that the direct kernel boot implementation in QEMU extract kernel data and ram data into guest memory area without any format parsing so I expect impact of malformed images may be limited to the instances which are launched with these images basically.
im proxying some of the converstaion i have had with dan on this topic but for exmaple the root disk image cirros-0.6.2-x86_64-blank.img shoudl really be disk-format=GPT i.e. declaring to glance that this iamge contains a gpt partition table
I definitely don't want to add cpio (and etc) to glance as disk_format options just because the kernel or ramdisk image may be encoded that way. However, if we can get to the point where nova will boot a disk_format=gpt but *not* a disk_format=raw, then raw can become "a non-bootable binary blob used for other purposes" ... which could be a kernel, ramdisk, or anything else.
--Dan
Disclaimer: I'm not a security expert or a QEMU expert so it'd be nice to hear opinion from someone more familiar with these
My understanding that the image format CVE we are mainly discussing here is caused by the way how QEMU (specifically qemu-img) handles image format.
However for kernel image and ramdisk image these are treated as pure "raw" image without any conversion or parsing method in both nova layer as well as QEMU layer (when these are associated with kernel/ramdisk_id, not by AMI), so it may have relatively lower risks to keep the current handling of these images
My current view after quick dig into the QEMU implementation is that the direct kernel boot implementation in QEMU extract kernel data and ram data into guest memory area without any format parsing so I expect impact of malformed images may be limited to the instances which are launched with these images basically.
Yeah, there's not any concern (that I know of) around interpreting kernel and ramdisk images with qemu. What I meant was the special casing we have in *nova* around AMI itself, which qemu knows nothing about and requires us to coerce the disk part of that to raw (but which *could* be a different format qemu does support) is the biggest problem for us.
So if we decide we need to keep the "allow an image to specify a kernel/ramdisk" for the moment, then so be it. But that's separate from the more pressing concern of just removing nova support for AMI as a format and all the hacks we have in place for it.
--Dan
participants (3)
-
Dan Smith
-
smooney@redhat.com
-
Takashi Kajinami