[ptg] Image Encryption Session, Thursday 30th @ 16:00 UTC
Dear OpenStack community, in preparation for the vPTG discussion slot for the image encryption (Thursday 30th at 16:00 UTC) I collected a summary explaining the key points and state of the contribution and gathered some discussion topics below. Since this spans multiple projects (mainly Glance, Cinder and Nova) and has been ongoing for quite some years, I tried my best to briefly summarize the most important aspects of the current state so that all those who wish to participate have the latest information. The text below can also be found on the etherpad: https://etherpad.opendev.org/p/2026.1-ptg-glance-planning#L128 Summary: - historically, only Cinder has been able to create and use encrypted images (as a side effect) by dumping LUKS-encrypted volumes 1:1 into images, for exclusive use in Cinder - this contribution aims to introduce encrypted image functionality for the end user: images can be encrypted/decrypted outside of OpenStack and resulting (encrypted) image is stored in Glance and the key in Barbican - goal: standardize encrypted image formats and handling across Glance, Cinder and Nova - it introduces two formats: raw LUKS and qcow2-wrapped LUKS (native qcow2 feature) - it introduces new key handling: if Barbican secret is of type 'passphrase' it is used as the LUKS key as-is; if it is of any other secret type, it is assumed to be binary and converted via 'binascii.hexlify()' (preserves Cinder behavior and compatibility) - it makes use of the new secret consumer API in Barbican to prevent accidental key deletion (since keys may be managed by users, not OpenStack), registering images as consumers to their keys - note: the image encryption feature focuses on Glance and Cinder for now, Nova has been largely excluded so far due to ephemeral storage encryption not being ready yet Implementation State (note: aside from os-brick this is not merged yet!): - os-brick: implemented outsourced utility function to discern encryption between passphrase and binary (hexlify conversion) for shared usage by Nova and Cinder - Glance: implemented new encryption-specific image formats and properties as well as secret consumer handling - Cinder: implemented encrypted image usage for volumes: LUKS is handled as raw (like before) and qcow2+LUKS is converted; image creation from volume stays limited to raw LUKS - Nova: implemented only compatibility with os-brick and Cinder changes so far - barbican-tempest-plugin: full scenario tests (image > volume > server
SSH) for 4 permutations of the image encryption
Patchsets: https://review.opendev.org/q/topic:%22LUKS-image-encryption%22 Discussion Topics: 1) Image inspection: related to CVE-2024-32498 concerns were brought up about encrypted image payload containing malicious formats triggering unintended QEMU behavior; how to screen encrypted image payload and reject undesired formats? 1a) limiting to specific formats has consequences for Cinder: attached (secondary) encrypted volumes might contain arbitrary binary data in VMs (user choice), when uploading as image to Glance it might be rejected as prohibited format; additionally, such image vanishes without trace because there is no error state for images (bad Glance <-> Cinder interaction) - how to address this? 1b) how deep do we want to look into encrypted image payload? decrypting a 1TB image might be very costly (CPU load, disk space) - problem: "qemu-img convert" cannot stream to stdout (see https://blogs.igalia.com/berto/2025/07/15/converting-qemu-qcow2-images-direc... ); in the worst case we have to wait for the full 1TB to be decrypted, consuming almost twice the amount of disk space in the process, instead of being able to only read the payload header and abort 2) Nova integration: how to approach future interoperability with ephemeral storage encryption? 2a) can we handle this as separate future expansion to this feature, i.e., merging Glance+Cinder functionality short-term with only compatibility changes in Nova for now and addressing full Nova integration as a next step with dedicated patchsets based on the current work? 3) Better organization: would there be any interest in reviving the image encryption popup-team meeting? https://meetings.opendev.org/#Image_Encryption_Popup-Team_Meeting Best regards, Markus Hentsch -- Markus Hentsch DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 00 markus.hentsch@cloudandheat.com | www.cloudandheat.com Green, Open, Efficient. Ihr Cloud-Service- und Cloud-Technologie-Provider aus Dresden. https://www.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann
On 28/10/2025 14:10, Markus Hentsch wrote:
Dear OpenStack community,
in preparation for the vPTG discussion slot for the image encryption (Thursday 30th at 16:00 UTC) I collected a summary explaining the key points and state of the contribution and gathered some discussion topics below. Since this spans multiple projects (mainly Glance, Cinder and Nova) and has been ongoing for quite some years, I tried my best to briefly summarize the most important aspects of the current state so that all those who wish to participate have the latest information.
The text below can also be found on the etherpad: https://etherpad.opendev.org/p/2026.1-ptg-glance-planning#L128
Summary:
- historically, only Cinder has been able to create and use encrypted images (as a side effect) by dumping LUKS-encrypted volumes 1:1 into images, for exclusive use in Cinder - this contribution aims to introduce encrypted image functionality for the end user: images can be encrypted/decrypted outside of OpenStack and resulting (encrypted) image is stored in Glance and the key in Barbican - goal: standardize encrypted image formats and handling across Glance, Cinder and Nova - it introduces two formats: raw LUKS and qcow2-wrapped LUKS (native qcow2 feature) - it introduces new key handling: if Barbican secret is of type 'passphrase' it is used as the LUKS key as-is; if it is of any other secret type, it is assumed to be binary and converted via 'binascii.hexlify()' (preserves Cinder behavior and compatibility) - it makes use of the new secret consumer API in Barbican to prevent accidental key deletion (since keys may be managed by users, not OpenStack), registering images as consumers to their keys - note: the image encryption feature focuses on Glance and Cinder for now, Nova has been largely excluded so far due to ephemeral storage encryption not being ready yet
Implementation State (note: aside from os-brick this is not merged yet!):
- os-brick: implemented outsourced utility function to discern encryption between passphrase and binary (hexlify conversion) for shared usage by Nova and Cinder - Glance: implemented new encryption-specific image formats and properties as well as secret consumer handling - Cinder: implemented encrypted image usage for volumes: LUKS is handled as raw (like before) and qcow2+LUKS is converted; image creation from volume stays limited to raw LUKS - Nova: implemented only compatibility with os-brick and Cinder changes so far - barbican-tempest-plugin: full scenario tests (image > volume > server > SSH) for 4 permutations of the image encryption
Patchsets: https://review.opendev.org/q/topic:%22LUKS-image-encryption%22
Discussion Topics:
1) Image inspection: related to CVE-2024-32498 concerns were brought up about encrypted image payload containing malicious formats triggering unintended QEMU behavior; how to screen encrypted image payload and reject undesired formats? 1a) limiting to specific formats has consequences for Cinder: attached (secondary) encrypted volumes might contain arbitrary binary data in VMs (user choice), when uploading as image to Glance it might be rejected as prohibited format; additionally, such image vanishes without trace because there is no error state for images (bad Glance <-> Cinder interaction) - how to address this? 1b) how deep do we want to look into encrypted image payload? decrypting a 1TB image might be very costly (CPU load, disk space) - problem: "qemu-img convert" cannot stream to stdout (see https://blogs.igalia.com/berto/2025/07/15/converting-qemu-qcow2-images-direc... ); in the worst case we have to wait for the full 1TB to be decrypted, consuming almost twice the amount of disk space in the process, instead of being able to only read the payload header and abort
honestly i think inspecting the content of the image violates one of the primary usecase for this which is confidential computing where we do not trust the infra hence why we are encyprting the image. so to me the only thing that feels reasonable for the luks image type is to assert that the final image has an a valid luks header but don't try and look inside for the presence fo a gtp partion table for example that would require decryption. for luks in qcow with an embedded luks partition then we shoudl aslo assert the qcow headers and ensure that non of the problematic feature like datafile or backing file fail the saftey check but again we should not be decrypting the content of the image. for qcow in luks that should not be supported. if it was supported that would only be reasonable if the disk type was luks but then qemu should really reject that because we should be explicitly telling qemu that the file is in raw format in this case not qcow. i don't think we can really protect form this by default but i we can mitigate this in 2 ways. one glance has a list of supported image type, admin can remove luks form that list. nova could/should have a similar list fo image type each compute node will allow to be used. that can simiarly allow use to reject luks images finally we coudl also have a polciy rule for luks image in nova/glance that default to member but could be restricted by an operator that would allow them to restrict it to the service role so that luks image could only be created and uploaded by nova or cinder not an end user. this is my preferred way to lock down the ability to create a malicious image by allowing operator to restrict creating luks images to services or admin. my less preferred option would be to default the new luks policy for nova/glance to requrie the manager or the admin roles. that woudl require the admin to use custom policy or grant enduse more permission then they likely shoudl have to use this feature. this to me feel like a featur that normal enduser shoudl be able to use out of the box so the default should be `member` IMO
2) Nova integration: how to approach future interoperability with ephemeral storage encryption? 2a) can we handle this as separate future expansion to this feature, i.e., merging Glance+Cinder functionality short-term with only compatibility changes in Nova for now and addressing full Nova integration as a next step with dedicated patchsets based on the current work?
this work is currently paused so while i don't think we should do something intentional incompatible i don't think you should have to go out of your way to explicitly support it provide you do not break the existing lvm backend supprot.
3) Better organization: would there be any interest in reviving the image encryption popup-team meeting? https://meetings.opendev.org/#Image_Encryption_Popup-Team_Meeting
Best regards,
Markus Hentsch
Dear Sean, thanks for sharing your thoughts and solution ideas on this. Sean Mooney wrote:
On 28/10/2025 14:10, Markus Hentsch wrote:
... - problem: "qemu-img convert" cannot stream to stdout (see https://blogs.igalia.com/berto/2025/07/15/converting-qemu-qcow2-images-direc... ); in the worst case we have to wait for the full 1TB to be decrypted, consuming almost twice the amount of disk space in the process, instead of being able to only read the payload header and abort
honestly i think inspecting the content of the image violates one of the primary usecase for this which is confidential computing where we do not trust the infra hence why we are encyprting the image.
so to me the only thing that feels reasonable for the luks image type is to assert that the final image has an a valid luks header but don't try and look inside for the presence fo a gtp partion table for example that would require decryption.
I do agree and this was also one of my thoughts but I did not explicitly mention it due to the following reason: You have to consider that due to the interaction between Glance and Barbican in terms of secret deletion on image deletion (like already implemented for Cinder-originated LUKS images) and secret consumer registration, Glance already has the Barbican secret ID (part of image metadata) and access to it (by inheriting the user's token RBAC during the requests). Hence, even if Glance decides not to decrypt the image at all, it still has the means to. In other words: compromising a Glance node still gives you all you need. If we decide not to decrypt it in Glance for image verification, at least it will not be lying around in unencrypted form directly on disk or RAM at any given time but that's just a small comfort given the bigger context. You still have to trust both Glance nodes as well as Nova comute nodes, even if it only effectively gets decrypted on the latter.
for luks in qcow with an embedded luks partition then we shoudl aslo assert the qcow headers and ensure that non of the problematic feature like datafile or backing file fail the saftey check but again we should not be decrypting the content of the image.
for qcow in luks that should not be supported.
if it was supported that would only be reasonable if the disk type was luks but then qemu should really reject that because we should be explicitly telling qemu that the file is in raw format in this case not qcow. i don't think we can really protect form this by default but i we can mitigate this in 2 ways.
one glance has a list of supported image type, admin can remove luks form that list.
nova could/should have a similar list fo image type each compute node will allow to be used. that can simiarly allow use to reject luks images
finally we coudl also have a polciy rule for luks image in nova/glance that default to member but could be restricted by an operator
that would allow them to restrict it to the service role so that luks image could only be created and uploaded by nova or cinder not an end user.
this is my preferred way to lock down the ability to create a malicious image by allowing operator to restrict creating luks images to services or admin.
my less preferred option would be to default the new luks policy for nova/glance to requrie the manager or the admin roles.
that woudl require the admin to use custom policy or grant enduse more permission then they likely shoudl have to use this feature. this to me feel like a featur that normal enduser shoudl be able to use out of the box so the default should be `member` IMO
So, if I understand you correctly, we can sufficiently check the qcow2+LUKS format by using the qemu tools to inspect its metadata and but cannot do the same for the raw LUKS format (as it is currently produced by Cinder) because it could contain anything and LUKS isn't able to tell us any details? Another thought regarding qcow2+LUKS: just theoretically, would it be possible to craft something like placing another (inner) qcow2 as the LUKS payload of the qcow2+LUKS image? I think Dan had something in mind that once the image has been decrypted, we cannot say 100% that the decrypted form won't hit another qemu tooling in some workflow later on, which could again trigger nasty things if the decrypted form is again some qcow2 stuff. Regarding the allowlist configuration and policy approaches: if the upload or usage of raw LUKS images was restricted to operator/admin users, we would render Cinder's upload-volume-to-image (i.e. "openstack image create --volume ...") unusable for regular users, because this exact format is currently produced by Cinder in such case and consumed if you create a volume based on such cinder-created image.
2) Nova integration: how to approach future interoperability with ephemeral storage encryption? 2a) can we handle this as separate future expansion to this feature, i.e., merging Glance+Cinder functionality short-term with only compatibility changes in Nova for now and addressing full Nova integration as a next step with dedicated patchsets based on the current work?
this work is currently paused so while i don't think we should do something intentional incompatible i don't think you should have
to go out of your way to explicitly support it provide you do not break the existing lvm backend supprot.
Understood. Best regards, Markus Hentsch -- Markus Hentsch DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 00 markus.hentsch@cloudandheat.com | www.cloudandheat.com Green, Open, Efficient. Ihr Cloud-Service- und Cloud-Technologie-Provider aus Dresden. https://www.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann
On 30/10/2025 10:11, Markus Hentsch wrote:
Dear Sean, thanks for sharing your thoughts and solution ideas on this.
Sean Mooney wrote:
On 28/10/2025 14:10, Markus Hentsch wrote:
... - problem: "qemu-img convert" cannot stream to stdout (see https://blogs.igalia.com/berto/2025/07/15/converting-qemu-qcow2-images-direc... ); in the worst case we have to wait for the full 1TB to be decrypted, consuming almost twice the amount of disk space in the process, instead of being able to only read the payload header and abort
honestly i think inspecting the content of the image violates one of the primary usecase for this which is confidential computing where we do not trust the infra hence why we are encyprting the image.
so to me the only thing that feels reasonable for the luks image type is to assert that the final image has an a valid luks header but don't try and look inside for the presence fo a gtp partion table for example that would require decryption.
I do agree and this was also one of my thoughts but I did not explicitly mention it due to the following reason:
You have to consider that due to the interaction between Glance and Barbican in terms of secret deletion on image deletion (like already implemented for Cinder-originated LUKS images) and secret consumer registration, Glance already has the Barbican secret ID (part of image metadata) and access to it (by inheriting the user's token RBAC during the requests). Hence, even if Glance decides not to decrypt the image at all, it still has the means to. In other words: compromising a Glance node still gives you all you need.
that is true but just because it coudl use its privlage access to retrive the secret does nto mean it shoudl that would turn a possible breach of confidentiality to a realized one. the only way to really ensure that the infra cant decrypt your data is via doing encyption in the guest not the host
If we decide not to decrypt it in Glance for image verification, at least it will not be lying around in unencrypted form directly on disk or RAM at any given time but that's just a small comfort given the bigger context. You still have to trust both Glance nodes as well as Nova comute nodes, even if it only effectively gets decrypted on the latter.
it effectively is only decrypted by qemu today never at rest so even on the compute it would still be encypted. we do have to make a choice here on where we are drawing the line on security. nova can never trust glance to have fully veted the image even with the galnce as a defender feature so nova has to do some level of protection on its send but if we can force qemu to treat luks encrypted raw images as raw by hardcoding that in the xml that mitigates the security impact. if we cant do that then thise does not matter as all the attacker need to do is dd a qcow over the disk and reboot.
for luks in qcow with an embedded luks partition then we shoudl aslo assert the qcow headers and ensure that non of the problematic feature like datafile or backing file fail the saftey check but again we should not be decrypting the content of the image.
for qcow in luks that should not be supported.
if it was supported that would only be reasonable if the disk type was luks but then qemu should really reject that because we should be explicitly telling qemu that the file is in raw format in this case not qcow. i don't think we can really protect form this by default but i we can mitigate this in 2 ways.
one glance has a list of supported image type, admin can remove luks form that list.
nova could/should have a similar list fo image type each compute node will allow to be used. that can simiarly allow use to reject luks images
finally we coudl also have a polciy rule for luks image in nova/glance that default to member but could be restricted by an operator
that would allow them to restrict it to the service role so that luks image could only be created and uploaded by nova or cinder not an end user.
this is my preferred way to lock down the ability to create a malicious image by allowing operator to restrict creating luks images to services or admin.
my less preferred option would be to default the new luks policy for nova/glance to requrie the manager or the admin roles.
that woudl require the admin to use custom policy or grant enduse more permission then they likely shoudl have to use this feature. this to me feel like a featur that normal enduser shoudl be able to use out of the box so the default should be `member` IMO
So, if I understand you correctly, we can sufficiently check the qcow2+LUKS format by using the qemu tools to inspect its metadata and but cannot do the same for the raw LUKS format (as it is currently produced by Cinder) because it could contain anything and LUKS isn't able to tell us any details?
no we can use the oslo.utils image insepctor to inspect it. its unsafe to use qemu-img to inpsect the metadata as the act of doing qemu-img info or convert is enough to trigger the data exfiltration and embed the data in the image the qemu community has decied that they will not guarantee that qemu-img is safe for untrusted images and building a strong sandbox is not on there roadmap.
Another thought regarding qcow2+LUKS: just theoretically, would it be possible to craft something like placing another (inner) qcow2 as the LUKS payload of the qcow2+LUKS image? I think Dan had something in mind that once the image has been decrypted, we cannot say 100% that the decrypted form won't hit another qemu tooling in some workflow later on, which could again trigger nasty things if the decrypted form is again some qcow2 stuff.
qcow in qcow should be safe but replacing the payload of a raw image with
Regarding the allowlist configuration and policy approaches: if the upload or usage of raw LUKS images was restricted to operator/admin users, we would render Cinder's upload-volume-to-image (i.e. "openstack image create --volume ...") unusable for regular users, because this exact format is currently produced by Cinder in such case and consumed if you create a volume based on such cinder-created image.
not if we allow it for service role. i.e. cinder on would be able to do it because it has the service role (note this is diffent form teh service_user) and we can vet the code in cidner to do something safe. it would require cinder to use its credentials to do the upload instead of the user token by using the "admin" client which just uses the cinder user credital form the config. this would allow cidner to coninute to supprot tis current workflow if we limited the ablity to upload luks images to the service/admin role by default. im not saying we shoudl do this but its one valid approch to consdier.
2) Nova integration: how to approach future interoperability with ephemeral storage encryption? 2a) can we handle this as separate future expansion to this feature, i.e., merging Glance+Cinder functionality short-term with only compatibility changes in Nova for now and addressing full Nova integration as a next step with dedicated patchsets based on the current work?
this work is currently paused so while i don't think we should do something intentional incompatible i don't think you should have
to go out of your way to explicitly support it provide you do not break the existing lvm backend supprot.
Understood.
Best regards,
Markus Hentsch
participants (2)
-
Markus Hentsch
-
Sean Mooney