Open Stack

Fri Apr 11 19:57:43 UTC 2014

Hi All,

I'm running Ubuntu 12.04 + Havana (2013.2.2)  from cloud archive

Over the past few days I've noticed a number of my nodes (5% ish) have
been spending a lot of cpu time in 'system' state.  This seems to be
related to 'qemu-nbd -c' process that are spinning madly mostly on
disks from deleted instances. 'kill -9' seem the only way to get them
to stop.

Today I caught one that was spinning but on a file & instance that
actually existed.  Turns out the base image that the qcow2 file in the
qemu-nbd command line referenced was missing.

/var/lib/nova/instances is on local disk on the node (not on a shared
filesystem). Grepping the nova-compute logs I see recent references to
the base image being "active" and not elegibale for removal, there are
many similar older references (it's a popular base).  I can't find any
point where it was referenced in the logs but wasn't active.

The instance in question was launched last night and was functioning
*mostly* normally.  It should have has a 16G root base don instance
type though the root was much smaller (same size as if the instance
type had specified a 0 size root).  Presumably this is because the nbd
mapping never happened properly to grow the rootfs?  But if that's the
case, and the base was missing prior to launch, I don't see where the
running OS came from.

Any guesses what is going on or best recovery practices?  For now I
manually copied the base image from the glance store to the local file
it was expected in (setting owner & perms to match others), which
seems to work.

There haven't been any system or config level changes in the past
couple months, though I did recently refresh the base image in
question (and most of my other public base images).

-Jon

Open Stack

[Openstack-operators] missing base images and spinning qemu-nbd processes?

OpenStack

Community

Documentation

Branding & Legal