[openstack-dev] [Openstack-operators] [nova] Is verification of images in the image cache necessary?

Fichter, Dane G. Dane.Fichter at jhuapl.edu
Tue May 24 12:15:32 UTC 2016


Hi John and Matt,

I actually have a spec and patch up for review addressing some of what you’re referring to below.

https://review.openstack.org/#/c/314222/
https://review.openstack.org/#/c/312210/

I think you’re quite right that the existing ImageCacheManager code serves little purpose. What I propose here is a cryptographically stronger verification meant to protect against both deliberate modification by an adversary, as well as accidental sources of disk corruption. If you like, I can deprecate the checksum-based verification code in the image cache as a part of this change. Feel free me to email me back or ping me on IRC (dane-fichter) in order to discuss more.

Thanks,

Dane Fichter

From: Matthew Booth <mbooth at redhat.com<mailto:mbooth at redhat.com>>
Reply-To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Tuesday, May 24, 2016 at 6:49 AM
To: John Garbutt <john at johngarbutt.com<mailto:john at johngarbutt.com>>
Cc: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, "openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>" <openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>>
Subject: Re: [openstack-dev] [Openstack-operators] [nova] Is verification of images in the image cache necessary?

On Tue, May 24, 2016 at 11:06 AM, John Garbutt <john at johngarbutt.com<mailto:john at johngarbutt.com>> wrote:
On 24 May 2016 at 10:16, Matthew Booth <mbooth at redhat.com<mailto:mbooth at redhat.com>> wrote:
> During its periodic task, ImageCacheManager does a checksum of every image
> in the cache. It verifies this checksum against a previously stored value,
> or creates that value if it doesn't already exist.[1] Based on this
> information it generates a log message if the image is corrupt, but
> otherwise takes no action. Going by git, this has been the case since 2012.
>
> The commit which added it was associated with 'blueprint
> nova-image-cache-management phase 1'. I can't find this blueprint, but I did
> find this page: https://wiki.openstack.org/wiki/Nova-image-cache-management
> . This talks about 'detecting images which are corrupt'. It doesn't explain
> why we would want to do that, though. It also doesn't seem to have been
> followed through in the last 4 years, suggesting that nobody's really that
> bothered.
>
> I understand that corruption of bits on disks is a thing, but it's a thing
> for more than just the image cache. I feel that this is a problem much
> better solved at other layers, prime candidates being the block and
> filesystem layers. There are existing robust solutions to bitrot at both of
> these layers which would cover all aspects of data corruption, not just this
> randomly selected slice.

+1

That might mean improved docs on the need to configure such a thing.

> As it stands, I think this code is regularly running a pretty expensive task
> looking for something which will very rarely happen, only to generate a log
> message which nobody is looking for. And it could be solved better in other
> ways. Would anybody be sad if I deleted it?

For completeness, we need to deprecate it using the usual cycles:
https://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html

I guess I'm arguing that it isn't a feature, and never has been: it really doesn't do anything at all except generate a log message. Are log messages part of the deprecation contract?

If operators are genuinely finding corrupt images to be a problem and find this log message useful that would be extremely useful to know.


I like the idea of checking the md5 matches before each boot, as it
mirrors the check we do after downloading from glance. Its possible
thats very unlikely to spot anything that shouldn't already be worried
about by something else. It may just be my love of symmetry that makes
me like that idea?

It just feels arbitrary to me for a few reasons. Firstly, it's only relevant to storage schemes which use the file in the image cache as a backing file. In this libvirt driver, this is just the qcow2 backend. While this is the default, most users are actually using ceph. Assuming it isn't cloning it directly from ceph-backed glance, the Rbd backend imports from the image cache during spawn, and has nothing to do with it thereafter. So for Rbd we'd want to check during spawn. Same for the Flat, Lvm and Ploop backends.

Except that it's still arbitrary because we're not checking the Qcow overlay on each boot. Or ephemeral or swap disks. Or Lvm, Flat or Rbd disks at all. Or the operating system. And it's still expensive, and better done by the block or filesystem layer.

I'm not personally convinced there's all that much point checking during download either, but given that we're loading all the bits anyway that check is essentially free. However, even if we decided we needed to defend the system against bitrot above the block/filesystem layer (and I'm not at all convinced of that) we'd want a coordinated design for it. Without one, we risk implementing a bunch of disconnected/incomplete stuff that doesn't meet anybody's needs, but burns resources anyway.

Matt
--
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160524/c7d2b411/attachment.html>


More information about the OpenStack-dev mailing list