Hi,
there is a openstack client extention called osc-placement that you can install to help. we also have a heal allcoation command in nova-manage that may help but the next step would be to validate if the old RPs are still in use or not. from there you can then work to align novas and placment view with the real toplogy.
I read about that in the docs, but there's no RPM for our distro (openSUSE), I guess we'll have to build it from source.
what you proably need to do in this case is check if the RPs still have allocations and if so verify that the allocation are owned by vms that nolonger exist.
Is this the right place to look at? MariaDB [nova]> select count(*) from nova_api.allocations; +----------+ | count(*) | +----------+ | 263 | +----------+ MariaDB [nova]> select resource_provider_id,consumer_id from nova_api.allocations limit 10; +----------------------+--------------------------------------+ | resource_provider_id | consumer_id | +----------------------+--------------------------------------+ | 3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 | | 3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 | | 3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 | | 3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 | | 3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 | | 3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 | | 1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 | | 1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 | | 1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 | | 1 | 339d0585-b671-4afa-918b-a772bfc36da8 | +----------------------+--------------------------------------+ MariaDB [nova]> select name,id from nova_api.resource_providers; +--------------------------+----+ | name | id | +--------------------------+----+ | compute1.fqdn | 3 | | compute2.fqdn | 1 | | compute3.fqdn | 2 | | compute4.fqdn | 4 | +--------------------------+----+ I only checked four of those consumer_id entries and all are existing VMs, I'll need to check all of them tomorrow. So I guess we should try to get the osc-placement tool running for us. Thanks, that already helped a lot! Eugen Zitat von Sean Mooney <smooney@redhat.com>:
On Mon, 2021-03-08 at 14:18 +0000, Eugen Block wrote:
Thank you, Sean.
so you need to do openstack compute service list to get teh compute service ids then do openstack compute service delete <id-1> <id-2> ...
you need to make sure that you only remvoe the unused old serivces but i think that would fix your issue.
That's the thing, they don't show up in the compute service list. But I also found them in the resource_providers table, only the old compute nodes appear here:
MariaDB [nova]> select name from nova_api.resource_providers; +--------------------------+
name | +--------------------------+ compute1.fqdn | compute2.fqdn | compute3.fqdn | compute4.fqdn | +--------------------------+ ah in that case the compute service delete is ment to remove the RPs too but if the RP had stale allcoation at teh time of the delete the RP delete will fail
what you proably need to do in this case is check if the RPs still have allocations and if so verify that the allocation are owned by vms that nolonger exist. if that is the case you should be able to delete teh allcaotion and then the RP if the allocations are related to active vms that are now on the rebuild nodes then you will have to try and heal the allcoations.
there is a openstack client extention called osc-placement that you can install to help. we also have a heal allcoation command in nova-manage that may help but the next step would be to validate if the old RPs are still in use or not. from there you can then work to align novas and placment view with the real toplogy.
that could invovle removing the old compute nodes form the compute_nodes table or marking them as deleted but both nova db and plamcent need to be kept in sysnc to correct your current issue.
Zitat von Sean Mooney <smooney@redhat.com>:
On Mon, 2021-03-08 at 13:18 +0000, Eugen Block wrote:
Hi *,
I have a quick question, last year we migrated our OpenStack to a highly available environment through a reinstall of all nodes. The migration went quite well, we're working happily in the new cloud but the databases still contain deprecated data. For example, the nova-scheduler logs lines like these on a regular basis:
/var/log/nova/nova-scheduler.log:2021-02-19 12:02:46.439 23540 WARNING nova.scheduler.host_manager [...] No compute service record found for host compute1
This is one of the old compute nodes that has been reinstalled and is now compute01. I tried to find the right spot to delete some lines in the DB but there are a couple of places so I wanted to check and ask you for some insights.
The scheduler messages seem to originate in
/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py
---snip--- for cell_uuid, computes in compute_nodes.items(): for compute in computes: service = services.get(compute.host)
if not service: LOG.warning( "No compute service record found for host %(host)s", {'host': compute.host}) continue ---snip---
So I figured it could be this table in the nova DB:
---snip--- MariaDB [nova]> select host,deleted from compute_nodes; +-----------+---------+
host | deleted | +-----------+---------+ compute01 | 0 | compute02 | 0 | compute03 | 0 | compute04 | 0 | compute05 | 0 | compute1 | 0 | compute2 | 0 | compute3 | 0 | compute4 | 0 | +-----------+---------+ ---snip---
What would be the best approach here to clean up a little? I believe it would be safe to simply purge those lines containing the old compute node, but there might be a smoother way. Or maybe there are more places to purge old data from? so the step you porably missed was deleting the old compute service records
so you need to do openstack compute service list to get teh compute service ids then do openstack compute service delete <id-1> <id-2> ...
you need to make sure that you only remvoe the unused old serivces but i think that would fix your issue.
I'd appreciate any ideas.
Regards, Eugen