On 30/10/2025 10:11, Markus Hentsch wrote:
Dear Sean, thanks for sharing your thoughts and solution ideas on this.
Sean Mooney wrote:
On 28/10/2025 14:10, Markus Hentsch wrote:
... - problem: "qemu-img convert" cannot stream to stdout (see https://blogs.igalia.com/berto/2025/07/15/converting-qemu-qcow2-images-direc... ); in the worst case we have to wait for the full 1TB to be decrypted, consuming almost twice the amount of disk space in the process, instead of being able to only read the payload header and abort
honestly i think inspecting the content of the image violates one of the primary usecase for this which is confidential computing where we do not trust the infra hence why we are encyprting the image.
so to me the only thing that feels reasonable for the luks image type is to assert that the final image has an a valid luks header but don't try and look inside for the presence fo a gtp partion table for example that would require decryption.
I do agree and this was also one of my thoughts but I did not explicitly mention it due to the following reason:
You have to consider that due to the interaction between Glance and Barbican in terms of secret deletion on image deletion (like already implemented for Cinder-originated LUKS images) and secret consumer registration, Glance already has the Barbican secret ID (part of image metadata) and access to it (by inheriting the user's token RBAC during the requests). Hence, even if Glance decides not to decrypt the image at all, it still has the means to. In other words: compromising a Glance node still gives you all you need.
that is true but just because it coudl use its privlage access to retrive the secret does nto mean it shoudl that would turn a possible breach of confidentiality to a realized one. the only way to really ensure that the infra cant decrypt your data is via doing encyption in the guest not the host
If we decide not to decrypt it in Glance for image verification, at least it will not be lying around in unencrypted form directly on disk or RAM at any given time but that's just a small comfort given the bigger context. You still have to trust both Glance nodes as well as Nova comute nodes, even if it only effectively gets decrypted on the latter.
it effectively is only decrypted by qemu today never at rest so even on the compute it would still be encypted. we do have to make a choice here on where we are drawing the line on security. nova can never trust glance to have fully veted the image even with the galnce as a defender feature so nova has to do some level of protection on its send but if we can force qemu to treat luks encrypted raw images as raw by hardcoding that in the xml that mitigates the security impact. if we cant do that then thise does not matter as all the attacker need to do is dd a qcow over the disk and reboot.
for luks in qcow with an embedded luks partition then we shoudl aslo assert the qcow headers and ensure that non of the problematic feature like datafile or backing file fail the saftey check but again we should not be decrypting the content of the image.
for qcow in luks that should not be supported.
if it was supported that would only be reasonable if the disk type was luks but then qemu should really reject that because we should be explicitly telling qemu that the file is in raw format in this case not qcow. i don't think we can really protect form this by default but i we can mitigate this in 2 ways.
one glance has a list of supported image type, admin can remove luks form that list.
nova could/should have a similar list fo image type each compute node will allow to be used. that can simiarly allow use to reject luks images
finally we coudl also have a polciy rule for luks image in nova/glance that default to member but could be restricted by an operator
that would allow them to restrict it to the service role so that luks image could only be created and uploaded by nova or cinder not an end user.
this is my preferred way to lock down the ability to create a malicious image by allowing operator to restrict creating luks images to services or admin.
my less preferred option would be to default the new luks policy for nova/glance to requrie the manager or the admin roles.
that woudl require the admin to use custom policy or grant enduse more permission then they likely shoudl have to use this feature. this to me feel like a featur that normal enduser shoudl be able to use out of the box so the default should be `member` IMO
So, if I understand you correctly, we can sufficiently check the qcow2+LUKS format by using the qemu tools to inspect its metadata and but cannot do the same for the raw LUKS format (as it is currently produced by Cinder) because it could contain anything and LUKS isn't able to tell us any details?
no we can use the oslo.utils image insepctor to inspect it. its unsafe to use qemu-img to inpsect the metadata as the act of doing qemu-img info or convert is enough to trigger the data exfiltration and embed the data in the image the qemu community has decied that they will not guarantee that qemu-img is safe for untrusted images and building a strong sandbox is not on there roadmap.
Another thought regarding qcow2+LUKS: just theoretically, would it be possible to craft something like placing another (inner) qcow2 as the LUKS payload of the qcow2+LUKS image? I think Dan had something in mind that once the image has been decrypted, we cannot say 100% that the decrypted form won't hit another qemu tooling in some workflow later on, which could again trigger nasty things if the decrypted form is again some qcow2 stuff.
qcow in qcow should be safe but replacing the payload of a raw image with
Regarding the allowlist configuration and policy approaches: if the upload or usage of raw LUKS images was restricted to operator/admin users, we would render Cinder's upload-volume-to-image (i.e. "openstack image create --volume ...") unusable for regular users, because this exact format is currently produced by Cinder in such case and consumed if you create a volume based on such cinder-created image.
not if we allow it for service role. i.e. cinder on would be able to do it because it has the service role (note this is diffent form teh service_user) and we can vet the code in cidner to do something safe. it would require cinder to use its credentials to do the upload instead of the user token by using the "admin" client which just uses the cinder user credital form the config. this would allow cidner to coninute to supprot tis current workflow if we limited the ablity to upload luks images to the service/admin role by default. im not saying we shoudl do this but its one valid approch to consdier.
2) Nova integration: how to approach future interoperability with ephemeral storage encryption? 2a) can we handle this as separate future expansion to this feature, i.e., merging Glance+Cinder functionality short-term with only compatibility changes in Nova for now and addressing full Nova integration as a next step with dedicated patchsets based on the current work?
this work is currently paused so while i don't think we should do something intentional incompatible i don't think you should have
to go out of your way to explicitly support it provide you do not break the existing lvm backend supprot.
Understood.
Best regards,
Markus Hentsch