[Openstack-operators] missing base images and spinning qemu-nbd processes?
jon at jonproulx.com
Fri Apr 11 19:57:43 UTC 2014
I'm running Ubuntu 12.04 + Havana (2013.2.2) from cloud archive
Over the past few days I've noticed a number of my nodes (5% ish) have
been spending a lot of cpu time in 'system' state. This seems to be
related to 'qemu-nbd -c' process that are spinning madly mostly on
disks from deleted instances. 'kill -9' seem the only way to get them
Today I caught one that was spinning but on a file & instance that
actually existed. Turns out the base image that the qcow2 file in the
qemu-nbd command line referenced was missing.
/var/lib/nova/instances is on local disk on the node (not on a shared
filesystem). Grepping the nova-compute logs I see recent references to
the base image being "active" and not elegibale for removal, there are
many similar older references (it's a popular base). I can't find any
point where it was referenced in the logs but wasn't active.
The instance in question was launched last night and was functioning
*mostly* normally. It should have has a 16G root base don instance
type though the root was much smaller (same size as if the instance
type had specified a 0 size root). Presumably this is because the nbd
mapping never happened properly to grow the rootfs? But if that's the
case, and the base was missing prior to launch, I don't see where the
running OS came from.
Any guesses what is going on or best recovery practices? For now I
manually copied the base image from the glance store to the local file
it was expected in (setting owner & perms to match others), which
seems to work.
There haven't been any system or config level changes in the past
couple months, though I did recently refresh the base image in
question (and most of my other public base images).
More information about the OpenStack-operators