[Openstack] How to recover nova-compute from a failed "nova boot"?

Philipp Wollermann wollermann_philipp at cyberagent.co.jp
Tue Mar 27 06:11:22 UTC 2012


Hi,

I set up OpenStack according to Martin's tutorial at hastexo.com today on my development machine inside VirtualBox. As I forgot to change the libvirt_type to qemu and kvm isn't available inside VirtualBox, nova-compute understandably failed to boot the VM I created.

I changed the value in nova.conf (and nova-compute.conf as well) and restarted the nova services, expecting that now everything just boots up correctly, but nova-compute didn't recover at all. Instead, I got this exception in the logs:

2012-03-27 14:59:01 CRITICAL nova [-] Instance instance-00000001 could not be found.
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE:   File "/usr/bin/nova-compute", line 49, in <module>
(nova): TRACE:     service.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 413, in wait
(nova): TRACE:     _launcher.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 131, in wait
(nova): TRACE:     service.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
(nova): TRACE:     return self._exit_event.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
(nova): TRACE:     return hubs.get_hub().switch()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
(nova): TRACE:     return self.greenlet.switch()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
(nova): TRACE:     result = function(*args, **kwargs)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 101, in run_server
(nova): TRACE:     server.start()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 162, in start
(nova): TRACE:     self.manager.init_host()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 247, in init_host
(nova): TRACE:     self.reboot_instance(context, instance['uuid'])
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova): TRACE:     return f(*args, **kw)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 153, in decorated_function
(nova): TRACE:     function(self, context, instance_uuid, *args, **kwargs)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
(nova): TRACE:     return function(self, context, instance_uuid, *args, **kwargs)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 898, in reboot_instance
(nova): TRACE:     reboot_type)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova): TRACE:     return f(*args, **kw)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 753, in reboot
(nova): TRACE:     if self._soft_reboot(instance):
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 773, in _soft_reboot
(nova): TRACE:     dom = self._lookup_by_name(instance.name)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 1567, in _lookup_by_name
(nova): TRACE:     raise exception.InstanceNotFound(instance_id=instance_name)
(nova): TRACE: InstanceNotFound: Instance instance-00000001 could not be found.
(nova): TRACE: 

I can only guess, that nova-compute failed somewhere where it isn't expected and left the data regarding this VM in an undefined state.

I found no way to recover from this failure. I tried to just "nova delete" the machine, checked a few minutes later using "nova show" and saw:

OS-DCF:diskConfig                   | MANUAL
OS-EXT-SRV-ATTR:host                | vagrant-precise64
OS-EXT-SRV-ATTR:hypervisor_hostname | None
OS-EXT-SRV-ATTR:instance_name       | instance-00000001
OS-EXT-STS:power_state              | 8
OS-EXT-STS:task_state               | deleting
OS-EXT-STS:vm_state                 | active
...

That looks right, but the deletion process never finishes. Nothing at all happens in the logs.
In "nova list", the instance is still listed as "Status: ACTIVE".

I tried to stop nova, delete the instance directory in /var/lib/nova/instances and restart nova, but that didn't help either (same exception).
I stopped nova again, deleted the VM from the instances (+ security_group_instance_association and instance_info_caches) table in nova's MySQL DB and restarted nova, but just got this different exception in the logs:

2012-03-27 14:55:11 ERROR nova.rpc.amqp [req-26b4686a-85f4-4566-bb0f-d87e8456b1f2 6b177562cbc1434fade182a45427134d 3a21af5fa5fc470ebe2f2471ff5b49d3] Exception during message handling
(nova.rpc.amqp): TRACE: Traceback (most recent call last):
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
(nova.rpc.amqp): TRACE:     rval = node_func(context=ctxt, **node_args)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE:     return f(*args, **kw)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 142, in decorated_function
(nova.rpc.amqp): TRACE:     locked = self.get_lock(context, instance_uuid)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE:     return f(*args, **kw)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
(nova.rpc.amqp): TRACE:     return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1597, in get_lock
(nova.rpc.amqp): TRACE:     instance_ref = self.db.instance_get_by_uuid(context, instance_uuid)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 549, in instance_get_by_uuid
(nova.rpc.amqp): TRACE:     return IMPL.instance_get_by_uuid(context, uuid)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
(nova.rpc.amqp): TRACE:     return f(*args, **kwargs)
(nova.rpc.amqp): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1345, in instance_get_by_uuid
(nova.rpc.amqp): TRACE:     raise exception.InstanceNotFound(instance_id=uuid)
(nova.rpc.amqp): TRACE: InstanceNotFound: Instance 73e90a02-7cef-4d64-a369-fbbc668ea91c could not be found.

Of course, I can just reset the whole DB and try again, as this is a development machine … but shouldn't nova-compute handle this (or any) kind of failure more gracefully?
Is there a way to cleanly recover from this situation?

Best regards,
Philipp





More information about the Openstack mailing list