Re: Cleanup database(s)

8 Mar 2021

      Hi,
...
there is a openstack client extention called osc-placement that you  
can install to help.
we also have a heal allcoation command in nova-manage that may help  
but the next step would be to validate
if the old RPs are still in use or not. from there you can then work  
to align novas and placment view with
the real toplogy.
I read about that in the docs, but there's no RPM for our distro  
(openSUSE), I guess we'll have to build it from source.
...
what you proably need to do in this case is check if the RPs still  
have allocations and if so
verify that the allocation are owned by vms that nolonger exist.
Is this the right place to look at?

MariaDB [nova]> select count(*) from nova_api.allocations;
+----------+
| count(*) |
+----------+
|      263 |
+----------+

MariaDB [nova]> select resource_provider_id,consumer_id from  
nova_api.allocations limit 10;
+----------------------+--------------------------------------+
| resource_provider_id | consumer_id                          |
+----------------------+--------------------------------------+
|                    3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 |
|                    3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 |
|                    3 | fce8f56e-e50b-47ef-bbf5-87b91336b2d4 |
|                    3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 |
|                    3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 |
|                    3 | 67d95ce0-7902-40db-8ad7-ef0ce350bcb4 |
|                    1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 |
|                    1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 |
|                    1 | 0caaebae-56a6-45d8-a486-f3294ab321e8 |
|                    1 | 339d0585-b671-4afa-918b-a772bfc36da8 |
+----------------------+--------------------------------------+

MariaDB [nova]> select name,id from nova_api.resource_providers;
+--------------------------+----+
| name                     | id |
+--------------------------+----+
| compute1.fqdn            |  3 |
| compute2.fqdn            |  1 |
| compute3.fqdn            |  2 |
| compute4.fqdn            |  4 |
+--------------------------+----+

I only checked four of those consumer_id entries and all are existing  
VMs, I'll need to check all of them tomorrow. So I guess we should try  
to get the osc-placement tool running for us.

Thanks, that already helped a lot!

Eugen

Zitat von Sean Mooney <smooney@redhat.com>:
...
On Mon, 2021-03-08 at 14:18 +0000, Eugen Block wrote:
...
Thank you, Sean.
...
so you need to do
openstack compute service list to get teh compute service ids
then do
openstack compute service delete <id-1> <id-2> ...
you need to make sure that you only remvoe the unused old serivces
but i think that would fix your issue.
That's the thing, they don't show up in the compute service list. But
I also found them in the resource_providers table, only the old
compute nodes appear here:
MariaDB [nova]> select name from nova_api.resource_providers;
+--------------------------+
...
name                     |
+--------------------------+
compute1.fqdn            |
compute2.fqdn            |
compute3.fqdn            |
compute4.fqdn            |
+--------------------------+
ah in that case the compute service delete is ment to remove the RPs too
but if the RP had stale allcoation at teh time of the delete the RP  
delete will fail
what you proably need to do in this case is check if the RPs still  
have allocations and if so
verify that the allocation are owned by vms that nolonger exist.
if that is the case you should be able to delete teh allcaotion and  
then the RP
if the allocations are related to active vms that are now on the  
rebuild nodes then you will have to try and
heal the allcoations.
there is a openstack client extention called osc-placement that you  
can install to help.
we also have a heal allcoation command in nova-manage that may help  
but the next step would be to validate
if the old RPs are still in use or not. from there you can then work  
to align novas and placment view with
the real toplogy.
that could invovle removing the old compute nodes form the  
compute_nodes table or marking them as deleted but
both nova db and plamcent need to be kept in sysnc to correct your  
current issue.
...
Zitat von Sean Mooney <smooney@redhat.com>:
...
On Mon, 2021-03-08 at 13:18 +0000, Eugen Block wrote:
...
Hi *,
I have a quick question, last year we migrated our OpenStack to a
highly available environment through a reinstall of all nodes. The
migration went quite well, we're working happily in the new cloud but
the databases still contain deprecated data. For example, the
nova-scheduler logs lines like these on a regular basis:
/var/log/nova/nova-scheduler.log:2021-02-19 12:02:46.439 23540 WARNING
nova.scheduler.host_manager [...] No compute service record found for
host compute1
This is one of the old compute nodes that has been reinstalled and is
now compute01. I tried to find the right spot to delete some lines in
the DB but there are a couple of places so I wanted to check and ask
you for some insights.
The scheduler messages seem to originate in
/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py
---snip---
         for cell_uuid, computes in compute_nodes.items():
             for compute in computes:
                 service = services.get(compute.host)
                 if not service:
                     LOG.warning(
                         "No compute service record found for host
%(host)s",
                         {'host': compute.host})
                     continue
---snip---
So I figured it could be this table in the nova DB:
---snip---
MariaDB [nova]> select host,deleted from compute_nodes;
+-----------+---------+
...
host      | deleted |
+-----------+---------+
compute01 |       0 |
compute02 |       0 |
compute03 |       0 |
compute04 |       0 |
compute05 |       0 |
compute1  |       0 |
compute2  |       0 |
compute3  |       0 |
compute4  |       0 |
+-----------+---------+
---snip---
What would be the best approach here to clean up a little? I believe
it would be safe to simply purge those lines containing the old
compute node, but there might be a smoother way. Or maybe there are
more places to purge old data from?
so the step you porably missed was deleting the old compute  
service records
so you need to do
openstack compute service list to get teh compute service ids
then do
openstack compute service delete <id-1> <id-2> ...
you need to make sure that you only remvoe the unused old serivces
but i think that would fix your issue.
...
I'd appreciate any ideas.
Regards,
Eugen