[nova] Clean up "building" instances
Hi, we had a network issue two weeks ago in a HA Victoria cloud resulting in a couple of stale resources (in pending state). Most of them I could cleanup relatively easy, but two instances are left in "building" state, but not yet in the nova database so I can't just remove them via 'openstack server delete <UUID>'. I've been looking through the various nova databases where traces have been left to get an impression where I could intervene (although I don't like manipulating the database). The VMs are two amphora instances: control01:~ # openstack server list --project service | grep -v ACTIVE +--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | 0453a7e5-e4f9-419b-ad71-d837a20ef6bb | amphora-0ee32901-0c59-4752-8253-35b66da176ea | BUILD | | amphora-x64-haproxy_1.0.0 | amphora | | dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b | amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 | BUILD | | amphora-x64-haproxy_1.0.0 | amphora |+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ The database tables referring to the UUID 0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these: nova_cell0/instance_id_mappings.ibd nova_cell0/instance_info_caches.ibd nova_cell0/instance_extra.ibd nova_cell0/instances.ibd nova_cell0/instance_system_metadata.ibd octavia/amphora.ibd nova_api/instance_mappings.ibd nova_api/request_specs.ibd My first approach would be to update the nova_cell0.instances table and edit the fields 'vm_state' and 'task_state', or even remove the intire row. But I don't know about the implications this would have on the other tables, so I'd like to know how you would recommend to deal with these orphans. Any comment is appreciated! Thanks, Eugen
Le lun. 20 févr. 2023, 11:33, Eugen Block <eblock@nde.ag> a écrit :
Hi,
we had a network issue two weeks ago in a HA Victoria cloud resulting in a couple of stale resources (in pending state). Most of them I could cleanup relatively easy, but two instances are left in "building" state, but not yet in the nova database so I can't just remove them via 'openstack server delete <UUID>'. I've been looking through the various nova databases where traces have been left to get an impression where I could intervene (although I don't like manipulating the database). The VMs are two amphora instances:
control01:~ # openstack server list --project service | grep -v ACTIVE
+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | 0453a7e5-e4f9-419b-ad71-d837a20ef6bb | amphora-0ee32901-0c59-4752-8253-35b66da176ea | BUILD |
| amphora-x64-haproxy_1.0.0 | amphora | | dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b | amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 | BUILD |
| amphora-x64-haproxy_1.0.0 | amphora
|+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+
The database tables referring to the UUID 0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:
nova_cell0/instance_id_mappings.ibd nova_cell0/instance_info_caches.ibd nova_cell0/instance_extra.ibd nova_cell0/instances.ibd nova_cell0/instance_system_metadata.ibd octavia/amphora.ibd nova_api/instance_mappings.ibd nova_api/request_specs.ibd
My first approach would be to update the nova_cell0.instances table and edit the fields 'vm_state' and 'task_state', or even remove the intire row. But I don't know about the implications this would have on the other tables, so I'd like to know how you would recommend to deal with these orphans. Any comment is appreciated!
Just a simple thing : reset their states.
Thanks, Eugen
Thanks, I forgot to mention that nova doesn't seem to know about them (yet?): control01:~ # nova show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb ERROR (CommandError): No server with a name or ID of '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists. That's why I was already thinking about modifying the db. Zitat von Sylvain Bauza <sylvain.bauza@gmail.com>:
Le lun. 20 févr. 2023, 11:33, Eugen Block <eblock@nde.ag> a écrit :
Hi,
we had a network issue two weeks ago in a HA Victoria cloud resulting in a couple of stale resources (in pending state). Most of them I could cleanup relatively easy, but two instances are left in "building" state, but not yet in the nova database so I can't just remove them via 'openstack server delete <UUID>'. I've been looking through the various nova databases where traces have been left to get an impression where I could intervene (although I don't like manipulating the database). The VMs are two amphora instances:
control01:~ # openstack server list --project service | grep -v ACTIVE
+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+ | 0453a7e5-e4f9-419b-ad71-d837a20ef6bb | amphora-0ee32901-0c59-4752-8253-35b66da176ea | BUILD |
| amphora-x64-haproxy_1.0.0 | amphora | | dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b | amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 | BUILD |
| amphora-x64-haproxy_1.0.0 | amphora
|+--------------------------------------+----------------------------------------------+--------+-------------------------------------------------------------------------------------+---------------------------+---------+
The database tables referring to the UUID 0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:
nova_cell0/instance_id_mappings.ibd nova_cell0/instance_info_caches.ibd nova_cell0/instance_extra.ibd nova_cell0/instances.ibd nova_cell0/instance_system_metadata.ibd octavia/amphora.ibd nova_api/instance_mappings.ibd nova_api/request_specs.ibd
My first approach would be to update the nova_cell0.instances table and edit the fields 'vm_state' and 'task_state', or even remove the intire row. But I don't know about the implications this would have on the other tables, so I'd like to know how you would recommend to deal with these orphans. Any comment is appreciated!
Just a simple thing : reset their states.
Thanks, Eugen
Thanks, I forgot to mention that nova doesn't seem to know about them (yet?):
control01:~ # nova show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb ERROR (CommandError): No server with a name or ID of '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.
That's why I was already thinking about modifying the db.
This is probably because whatever network issues you were having resulted in not writing instance map records in the api database. Thus Nova can list them, but not find them on their own by UUID. I think you should be able to use 'nova-manage cell_v2 map_instances' (only needed on cell0) to rebuild those and then you can likely delete (and/or reset first) them. --Dan
Thank you. I tried this (didn't produce any output): control01:~ # nova-manage cell_v2 map_instances --cell_uuid 00000000-0000-0000-0000-000000000000 But the instance still can't be removed or reset: control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea No server with a name or ID of 'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists. Zitat von Dan Smith <dms@danplanet.com>:
Thanks, I forgot to mention that nova doesn't seem to know about them (yet?):
control01:~ # nova show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb ERROR (CommandError): No server with a name or ID of '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.
That's why I was already thinking about modifying the db.
This is probably because whatever network issues you were having resulted in not writing instance map records in the api database. Thus Nova can list them, but not find them on their own by UUID. I think you should be able to use 'nova-manage cell_v2 map_instances' (only needed on cell0) to rebuild those and then you can likely delete (and/or reset first) them.
--Dan
Thank you. I tried this (didn't produce any output):
control01:~ # nova-manage cell_v2 map_instances --cell_uuid 00000000-0000-0000-0000-000000000000
But the instance still can't be removed or reset:
control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea No server with a name or ID of 'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.
Ack, and I see you said that the instance was mentioned in instance-mappings anyway in your original post, my mistake. So perhaps it does already have a mapping. Without debug logs and database dumps in a bug it'll be hard to really pursue that further. --Dan
I created https://bugs.launchpad.net/nova/+bug/2007922 Zitat von Dan Smith <dms@danplanet.com>:
Thank you. I tried this (didn't produce any output):
control01:~ # nova-manage cell_v2 map_instances --cell_uuid 00000000-0000-0000-0000-000000000000
But the instance still can't be removed or reset:
control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea No server with a name or ID of 'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.
Ack, and I see you said that the instance was mentioned in instance-mappings anyway in your original post, my mistake. So perhaps it does already have a mapping. Without debug logs and database dumps in a bug it'll be hard to really pursue that further.
--Dan
Just to close this thread, with the help of Sylvain I updated the nova_api database and added the missing cell_id to the pending instances: MariaDB [nova_api]> update instance_mappings set cell_id='3' where instance_uuid='0453a7e5-e4f9-419b-ad71-d837a20ef6bb'; The cell_id can be found in the nova_api.cell_mappings table (unfortunately not in the output of 'nova-manage cell_v2 list_cells'). After that I could successfully delete the pending instances. Thank you for you quick help (especially Sylvain), I appreciate it! Eugen Zitat von Dan Smith <dms@danplanet.com>:
Thank you. I tried this (didn't produce any output):
control01:~ # nova-manage cell_v2 map_instances --cell_uuid 00000000-0000-0000-0000-000000000000
But the instance still can't be removed or reset:
control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea No server with a name or ID of 'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.
Ack, and I see you said that the instance was mentioned in instance-mappings anyway in your original post, my mistake. So perhaps it does already have a mapping. Without debug logs and database dumps in a bug it'll be hard to really pursue that further.
--Dan
participants (3)
-
Dan Smith
-
Eugen Block
-
Sylvain Bauza