[Openstack-security] [Bug 1350766] Re: Race condition: compute intermittently corrupts base images on download from glance
Michael Steffens
michael_steffens at posteo.de
Tue Aug 12 07:46:27 UTC 2014
A vulnerability exploited by normal user behavior (such as putting load
on a system), that can be used to cause corruption across different
user instances I'd even be more concerned about, than something that
needs special actions. On the other hand, modulating the load in manner
that a specific corruption (such as selectively dropping chunks), would
require very sophisticated actions, I agree.
Nevertheless, yesterday, after a regular Ubuntu nova-compute update
reverted my local fix to defective behavior, I observed a new variant of
corruption: A new snapshot booted fine, but then exposed filesystem
errors. After redoing the whole exercise using the same image after
reapplying the fsync patch, everything was fine.
I wouldn't be surprised if such issues do already surface in production
now and then (less frequent than in my environment, though), but are
then blamed on guest OS issues instead. Let me illustrate.:
This is how it looks to the end user: Take a snapshot, launch, fails.
Launch the same snapshot again, fails the same way. Looks like the
snaphost itself is defective, doesn't it? Most suspected: the filesystem
has been in inconsistent state when doing the snapshot. So let's do a
new snapshot. And indeed that either works, or fails consistently in a
different way than the first.
Who wouldn't conclude that it's the guest OS or the way the snapshot is
done (nothing OpenStack could do anything about) that is at fault,
rather than the image being corrupted after download from glance, and
then cached?
Is there anything I can provide to get this ticket out of the incomplete
and unassigned state?
--
You received this bug notification because you are a member of OpenStack
Security Group, which is subscribed to OpenStack.
https://bugs.launchpad.net/bugs/1350766
Title:
Race condition: compute intermittently corrupts base images on
download from glance
Status in OpenStack Compute (Nova):
New
Status in OpenStack Security Advisories:
Incomplete
Bug description:
Under certain conditions, which I happen to meet often on my Icehouse
single node setup, uploaded images or snapshots fail to boot. See also
https://ask.openstack.org/en/question/42804/icehouse-how-to-boot-a
-snapshot-from-a-running-instance/
Reason: When first instantiating a QCOW2 image, it's
(1) downloaded as QCOW2 to /var/lib/nova/instances/_base/IMAGEID.part
(2) converted to RAW format base /var/lib/nova/instances/_base/IMAGEID.converted using qemu-img
The step (1) is performed in nova/image/glance.py,
GlanceImageService.download using buffered IO, which does not
guarantee the resulting data to be written to disk on file close.
Consequently, the source image file may not be written completely when
qemu-img sub-process starts reading in step (2). Whether the result is
good or bad depends on speed of download, file size, and how quickly
qemu-image can digest its input.
Proposed fix: enforce fsync on output File object before returning
from download. Patch attached.
Security considerations:
* Due to the race between resources shared between users and tenants
(compute node network and filesystem IO) a failure can be triggered
across tenants, implying the risk of DoS.
* To make things worse -- with the default setting of not cleaning
the image cache -- any corrupted image will remain in cache until
replaced with fresh upload using a new image ID. Affected snapshots
remain unusable forever, until ex- and re-imported manually under
better conditions.
* Base image corruptions here are not detected and cannot be caught.
Theoretically (a bit esoteric, quite unlikely, but not impossible), an
attacker might modulate resource usage to precisely create an
incompletely written image, that boots and runs, but has access
control information stripped.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1350766/+subscriptions
More information about the Openstack-security
mailing list