[Openstack-operators] Base images removed in upgrade essex -> folsom and other stories
michael.still at canonical.com
Wed Nov 14 06:11:31 UTC 2012
On 11/14/2012 04:03 PM, Sam Morrison wrote:
> After the upgrade which went relatively smoothly (a lot easier than
> diablo -> essex) almost all our base images were deleted by the image
> cache clean up.
> I can't explain how this happened. We lost a total of about 70 images
> that affected ~200 running instances.
> We have since disabled this flag until we can find out what went wrong.
> I can see it in the logs and if this flag is enabled it would delete a
> lot of in use base files still.
> We have an nfs mounted /var/lib/nova/instances directory where the _base
> dir is located so I'm wondering if this had something to do with it?
> Is the image cache cleanup meant to work in a shared instance storage
This sounds familiar, so I went spelunking through bugs. Bug 1075018 is
relevant, but not actually what you're talking about. The closest bug I
can find is 1014227 which is about starting instances using shared storage.
So, I think this is a newly reported bug, but I have a pretty strong
idea where the problem is (the code doesn't handle shared storage for
base files at all). For now I'd recommend that all operators using
shared storage and libvirt disable image cache cleanup.
I've filed bug 1078594 for this issue, and I apologize for the pain caused.
More information about the OpenStack-operators