[openstack-dev] [nova] Adding temporary code to nova to work around bugs in system utilities

Jay Pipes jaypipes at gmail.com
Mon Dec 8 20:19:40 UTC 2014


On 12/03/2014 04:00 AM, Tony Breeds wrote:
> Hi All,
>      I'd like to accomplish 2 things with this message:
> 1) Unblock (one way or another) https://review.openstack.org/#/c/123957
> 2) Create some form of consensus on when it's okay to add temporary code to
>     nova to work around bugs in external utilities.
>
> So some background on this specific issue.  The issue was first reported in
> July 2014 at [1] and then clarified at [2].  The synopsis of the bug is that
> calling qemu-img convert -O raw /may/ generate a corrupt output file if the
> source image isn't fully flushed to disk.  The coreutils folk discovered
> something similar in 2011 *sigh*
>
> The clear and correct solution is to ensure that qemu-img uses
> FIEMAP_FLAG_SYNC.  This in turn produces a measurable slowdown in that code
> path, so additionally it's best if qemu-img uses an alternate method to
> determine data status in a disk image.  This has been done and will be included
> in qemu 2.2.0 when it's released.  These fixes prompted a more substantial
> rework of that code in qemu.  Which is awesome but not *required* to fix the
> bug in qemu.
>
> While we wait for $distros to get the fixed qemu nova is still vulnerable to
> the bug.  To that end I proposed a work around in nova that forces images
> retrieved from glance to disk with an fsync() prior to calling qemu-img on
> them.  I admit that this is ugly and has a performance impact.
>
> In order to reduce the impact of the fsync() I considered:
> 1) Testing the qemu version and only fsync()ing on affected versions.
>     - Vendors will backport the fix to there version of qemu.  The fixed version
>       will still claim to be 2.1.0 (for example) and therefore trigger the
>       fsync() when not required.  Given how unreliable this will be I dismissed
>       it as an option
>
> 2) API Change
>     - In the case of this specific bug we only need to fsync() in certain
>       scenarios.  It would be easy to add a flag to IMAGE_API.download() to
>       determine if this fsync() is required.  This has the nice property of only
>       having a performance impact in the suspect case (personally I'll take
>       slow-and-correct over fast-and-buggy any day).  My hesitation is that
>       after we've modified the API it's very hard to remove that change when we
>       decide the work around is redundant.
>
> 3) Config file option
>     - For many of the same reasons as the API change this seemed like a bad
>       idea.
>
> Does anyone have any other ideas?
>
> One thing that I haven't done is measure the impact of the fsync() on any
> reasonable workload.  This is mainly because I don't really know how.  Sure I
> could do some statistics in devstack but I don't really think they'd be
> meaningful.  Also the size of the image in glance is fairly important.  An
> fsync() of an 100Gb image is many times more painful than an 1Gb image.
>
> While in Paris I was asked to look at other code paths in nova where we use
> qemu-img convert.  I'm doing this analysis.  To date I have some suspicions
> that snapshot (and migration) are affected, but no data that confirms or
> debases that.  I continue to look at the appropriate code in nova, libvirt and
> qemu.
>
> I understand that there is more work to be done in this area, and I'm happy to
> do it.  Having said that from where I sit that work is not directly related to
> the bug that started this.
>
> As the idea is to remove this code as soon as all the distros we care about
> have a fixed qemu I started an albeit brief discussion here[3] on which distros
> are in that list.  Armed with that list I have opened (or am in the process of
> opening) bugs for each version of each distribution to make them aware of the
> issue and the fix.  I have a status page at [4].
>
> okay I think I'm done raving.
>
> So moving forward:
>
> 1) So what should I do with the open review?

I reviewed the patch. I don't mind the idea of a [workarounds] section 
of configuration options, but I had an issue with where that code was 
executed.

> 2) What can we learn from this in terms of how we work around key utilities
>     that are not in our direct power to change.
>     - Is taking ugly code for "some time" okay?  I understand that this is a
>       complex issue as we're relying on $developer to be around (or leave enough
>       information for those that follow) to determine when it's okay to remove
>       the ugliness.

I think it would be fine to have a [workarounds] config section for just 
this purpose.

Best,
-jay



More information about the OpenStack-dev mailing list