Re: [placement] Train upgrade warning
Hello, As another person has mentioned (Chanu Romain), I am attempting an upgrade from Queens to Train. I have successfully upgraded Keystone and Glance, but while trying to separate Placement's databases out from Nova, I run into the the same issue with placement-status upgrade check: # placement-status upgrade check +-----------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -3511 incomplete consumers table records for | | existing allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------+ Likewise, running placement-manage db online_data_migrations does nothing. I am running this placement migration on a dev cluster which has had the databases from a Prod cluster restored to it. This has worked well for me previously as a way to run through upgrades without touching the live system. Thanks very much, Seth
On 10/23/20 02:43, Seth Tunstall wrote:
Hello,
As another person has mentioned (Chanu Romain), I am attempting an upgrade from Queens to Train. I have successfully upgraded Keystone and Glance, but while trying to separate Placement's databases out from Nova, I run into the the same issue with placement-status upgrade check:
# placement-status upgrade check +-----------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -3511 incomplete consumers table records for | | existing allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------+
Likewise, running placement-manage db online_data_migrations does nothing.
Hm, in both cases I see that the status check is reporting a _negative_ number of consumers, which is unexpected. Looking at the code, this value is calculated by subtracting the number of 'consumers' [1] table records from the number of 'allocations' table consumers. The fact that this is negative means there are more 'consumers' table records with unique project/user pairs than there are 'allocations' table records with unique project/user pairs. This implies to me that there are orphaned 'consumers' table records in your databases. We had a bug around this in the past [2] where 'consumers' table records were not being deleted when they no longer had any 'allocations' table records associated with them. That bug was fixed in the Rocky release but the fix does not clean up orphaned 'consumers' records that were orphaned before the fix landed. Is your original deployment from before Rocky and has been upgraded? If so, then if you have orphaned 'consumers' records, you will need to do a one-time manual database clean up as described in this comment [3]. You can check whether you have any orphaned records by a SQL query like this: select * from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); If you do, after you clean them up manually, you should stop seeing the placement-status upgrade check warning. Note that it is not so harmful to have orphaned 'consumers' in database in that it won't hurt the operation of nova other than taking up database space and possibly impacting database and nova performance as a result. But it's a good idea to clean them up. And >= Rocky 'consumers' are properly deleted by placement so orphans cannot happen from then on. Hope this helps, -melanie [1] 'consumers' are project/user combinations of instance owners [2] https://bugs.launchpad.net/nova/+bug/1780799 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1726256#c24
Hello, I tried your procedure it didnt remove the warning from "placement-status upgrade check". As you said I did: delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); it removed only 2458 rows, far far away from my -15753 consumers. I restored my backup and tried what you said here https://bugzilla.redhat.com/show_bug.cgi?id=1726256#c24 update instance_id_mappings set deleted = id where uuid in (select uuid from instances where deleted != 0); nova-manage db archive_deleted_rows --until-complete delete from instance_id_mappings where uuid not in (select uuid from instances); (removed 70 rows) then I tried again: delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); (deleted around 15900 rows) Sadly, placement-status upgrade check still returns (There are -15753 incomplete consumers...) I restored my backup. It does not fix the bug but should i still delete the rows like I did? Do you have any other lead to fix it? Did I miss something? Best regards, Romain ________________________________________ From: melanie witt <melwittt@gmail.com> Sent: Friday, October 23, 2020 5:17 PM To: Seth Tunstall; openstack-discuss@lists.openstack.org Cc: CHANU ROMAIN Subject: Re: [placement] Train upgrade warning On 10/23/20 02:43, Seth Tunstall wrote:
Hello,
As another person has mentioned (Chanu Romain), I am attempting an upgrade from Queens to Train. I have successfully upgraded Keystone and Glance, but while trying to separate Placement's databases out from Nova, I run into the the same issue with placement-status upgrade check:
# placement-status upgrade check +-----------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -3511 incomplete consumers table records for | | existing allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------+
Likewise, running placement-manage db online_data_migrations does nothing.
Hm, in both cases I see that the status check is reporting a _negative_ number of consumers, which is unexpected. Looking at the code, this value is calculated by subtracting the number of 'consumers' [1] table records from the number of 'allocations' table consumers. The fact that this is negative means there are more 'consumers' table records with unique project/user pairs than there are 'allocations' table records with unique project/user pairs. This implies to me that there are orphaned 'consumers' table records in your databases. We had a bug around this in the past [2] where 'consumers' table records were not being deleted when they no longer had any 'allocations' table records associated with them. That bug was fixed in the Rocky release but the fix does not clean up orphaned 'consumers' records that were orphaned before the fix landed. Is your original deployment from before Rocky and has been upgraded? If so, then if you have orphaned 'consumers' records, you will need to do a one-time manual database clean up as described in this comment [3]. You can check whether you have any orphaned records by a SQL query like this: select * from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); If you do, after you clean them up manually, you should stop seeing the placement-status upgrade check warning. Note that it is not so harmful to have orphaned 'consumers' in database in that it won't hurt the operation of nova other than taking up database space and possibly impacting database and nova performance as a result. But it's a good idea to clean them up. And >= Rocky 'consumers' are properly deleted by placement so orphans cannot happen from then on. Hope this helps, -melanie [1] 'consumers' are project/user combinations of instance owners [2] https://bugs.launchpad.net/nova/+bug/1780799 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1726256#c24
On 10/25/20 03:10, CHANU ROMAIN wrote:
Hello,
I tried your procedure it didnt remove the warning from "placement-status upgrade check".
As you said I did: delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings);
it removed only 2458 rows, far far away from my -15753 consumers.
I restored my backup and tried what you said here https://bugzilla.redhat.com/show_bug.cgi?id=1726256#c24
update instance_id_mappings set deleted = id where uuid in (select uuid from instances where deleted != 0); nova-manage db archive_deleted_rows --until-complete
delete from instance_id_mappings where uuid not in (select uuid from instances); (removed 70 rows)
then I tried again: delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); (deleted around 15900 rows)
Sadly, placement-status upgrade check still returns (There are -15753 incomplete consumers...)
I restored my backup.
It does not fix the bug but should i still delete the rows like I did? Do you have any other lead to fix it? Did I miss something?
OK, if you did all of that and the value "-15753 incomplete consumers" has not changed at all, this sounds like your 'placement-status upgrade check' command is running on a different database than 'nova_api'. Did you already split out the placement database into a different database? If so, then you would have to do the cleanups with, for example: 'delete from placement.consumers where placement.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings);" if you have split the placement data into a new database called 'placement'. The queries written in that bug were done as though nova and placement are still sharing the 'nova_api' database. The 'consumers' table is owned by placement. The rest of the mentioned tables belong to nova. The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted. -melanie
________________________________________ From: melanie witt <melwittt@gmail.com> Sent: Friday, October 23, 2020 5:17 PM To: Seth Tunstall; openstack-discuss@lists.openstack.org Cc: CHANU ROMAIN Subject: Re: [placement] Train upgrade warning
On 10/23/20 02:43, Seth Tunstall wrote:
Hello,
As another person has mentioned (Chanu Romain), I am attempting an upgrade from Queens to Train. I have successfully upgraded Keystone and Glance, but while trying to separate Placement's databases out from Nova, I run into the the same issue with placement-status upgrade check:
# placement-status upgrade check +-----------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -3511 incomplete consumers table records for | | existing allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------+
Likewise, running placement-manage db online_data_migrations does nothing.
Hm, in both cases I see that the status check is reporting a _negative_ number of consumers, which is unexpected. Looking at the code, this value is calculated by subtracting the number of 'consumers' [1] table records from the number of 'allocations' table consumers. The fact that this is negative means there are more 'consumers' table records with unique project/user pairs than there are 'allocations' table records with unique project/user pairs.
This implies to me that there are orphaned 'consumers' table records in your databases. We had a bug around this in the past [2] where 'consumers' table records were not being deleted when they no longer had any 'allocations' table records associated with them. That bug was fixed in the Rocky release but the fix does not clean up orphaned 'consumers' records that were orphaned before the fix landed. Is your original deployment from before Rocky and has been upgraded?
If so, then if you have orphaned 'consumers' records, you will need to do a one-time manual database clean up as described in this comment [3]. You can check whether you have any orphaned records by a SQL query like this:
select * from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings);
If you do, after you clean them up manually, you should stop seeing the placement-status upgrade check warning.
Note that it is not so harmful to have orphaned 'consumers' in database in that it won't hurt the operation of nova other than taking up database space and possibly impacting database and nova performance as a result. But it's a good idea to clean them up. And >= Rocky 'consumers' are properly deleted by placement so orphans cannot happen from then on.
Hope this helps, -melanie
[1] 'consumers' are project/user combinations of instance owners [2] https://bugs.launchpad.net/nova/+bug/1780799 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1726256#c24
On 10/28/20 12:01, melanie witt wrote:
The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
Argh, must correct the way I tried to say: any records generated before a particular bug was fixed became orphaned if the instance was deleted before the bug fix was installed. -melanie
Hello, On 10/28/20 12:01, melanie witt wrote:
The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
I've done the following in this order: nova-manage api_db sync nova-manage db sync (to bring the DBs up to the version I'm upgrading to (Train) nova-manage db archive_deleted_rows --verbose --until-complete Archiving..............................................................complete +--------------------------+-------------------------+ | Table | Number of Rows Archived | +--------------------------+-------------------------+ | block_device_mapping | 4805 | | instance_actions | 7813 | | instance_actions_events | 8556 | | instance_extra | 2658 | | instance_faults | 1325 | | instance_group_member | 0 | | instance_id_mappings | 118 | | instance_info_caches | 2656 | | instance_mappings | 2656 | | instance_metadata | 845 | | instance_system_metadata | 24437 | | instances | 2656 | | migrations | 121 | | request_specs | 2656 | | virtual_interfaces | 2767 | +--------------------------+-------------------------+ nova-manage db purge --all --verbose DB: Deleted 4805 rows from shadow_block_device_mapping based on timestamp column (n/a) DB: Deleted 7813 rows from shadow_instance_actions based on timestamp column (n/a) DB: Deleted 8556 rows from shadow_instance_actions_events based on timestamp column (n/a) DB: Deleted 2658 rows from shadow_instance_extra based on timestamp column (n/a) DB: Deleted 1325 rows from shadow_instance_faults based on timestamp column (n/a) DB: Deleted 118 rows from shadow_instance_id_mappings based on timestamp column (n/a) DB: Deleted 2656 rows from shadow_instance_info_caches based on timestamp column (n/a) DB: Deleted 845 rows from shadow_instance_metadata based on timestamp column (n/a) DB: Deleted 24437 rows from shadow_instance_system_metadata based on timestamp column (n/a) DB: Deleted 2656 rows from shadow_instances based on timestamp column (n/a) DB: Deleted 121 rows from shadow_migrations based on timestamp column (n/a) DB: Deleted 2767 rows from shadow_virtual_interfaces based on timestamp column (n/a) [root@server01 placement]# nova-manage db null_instance_uuid_scan There were no records found where instance_uuid was NULL. [root@server01 placement]# nova-manage db online_data_migrations Running batches of 50 until complete 4 rows matched query migrate_empty_ratio, 4 migrated 46 rows matched query fill_virtual_interface_list, 0 migrated 46 rows matched query populate_user_id, 44 migrated 49 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 50 migrated 48 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 47 migrated 50 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 45 migrated 50 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 44 migrated 50 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 44 migrated 19 rows matched query fill_virtual_interface_list, 0 migrated 50 rows matched query populate_user_id, 44 migrated 49 rows matched query populate_user_id, 42 migrated 7 rows matched query populate_user_id, 0 migrated +---------------------------------------------+--------------+-----------+ | Migration | Total Needed | Completed | +---------------------------------------------+--------------+-----------+ | create_incomplete_consumers | 0 | 0 | | delete_build_requests_with_no_instance_uuid | 0 | 0 | | fill_virtual_interface_list | 312 | 0 | | migrate_empty_ratio | 4 | 4 | | migrate_keypairs_to_api_db | 0 | 0 | | migrate_quota_classes_to_api_db | 0 | 0 | | migrate_quota_limits_to_api_db | 0 | 0 | | migration_migrate_to_uuid | 0 | 0 | | populate_missing_availability_zones | 0 | 0 | | populate_queued_for_delete | 0 | 0 | | populate_user_id | 402 | 360 | | populate_uuids | 0 | 0 | | service_uuids_online_data_migration | 0 | 0 | +---------------------------------------------+--------------+-----------+ just to clean up the nova and nova_api databases before running your suggested SQL query: MariaDB [(none)]> delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); Query OK, 2656 rows affected (0.080 sec) Excellent! 2656 is quite a few. Then I run mysql-migrate-db.sh: # /usr/share/placement/mysql-migrate-db.sh --migrate Nova database contains data, placement database does not. Okay to proceed with migration Dumping from NOVA_API to migrate-db.wh1tDGAz/from-nova.sql Loading to PLACEMENT from migrate-db.wh1tDGAz/from-nova.sql /usr/lib/python2.7/site-packages/pymysql/cursors.py:170: Warning: (1280, u"Name 'alembic_version_pkc' ignored for PRIMARY key.") result = self._query(query) Hmm, a little worrying but let's continue: # placement-status upgrade check +----------------------------------------------------------------------+ | Upgrade Check Results | +----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are 6 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +----------------------------------------------------------------------+ Only 6, and it's now a positive number! Much progress. # placement-manage db online_data_migrations Running batches of 50 until complete 8 rows matched query create_incomplete_consumers, 8 migrated +-----------------------------+-------------+-----------+ | Migration | Total Found | Completed | +-----------------------------+-------------+-----------+ | set_root_provider_ids | 0 | 0 | | create_incomplete_consumers | 8 | 8 | +-----------------------------+-------------+-----------+ Hmm, 8 incomplete consumers created... # placement-status upgrade check +-----------------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -2 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------------+ argh! again a negative number! But at least it's only 2, which is well within the realm of manual fixes. Thank you very much Melanie for pointing me at what has been a very frustrating bug. Seth
On 10/30/20 01:37, Seth Tunstall wrote:
Hello,
On 10/28/20 12:01, melanie witt wrote:
The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
I've done the following in this order:
nova-manage api_db sync
nova-manage db sync
(to bring the DBs up to the version I'm upgrading to (Train)
nova-manage db archive_deleted_rows --verbose --until-complete
The thing I notice here ^ is that you didn't (but should) use --all-cells to also clean up based on the nova_cell0 database (where instances that failed scheduling go). If you've ever had an instance go into ERROR state for failing the scheduling step and you deleted it, its nova_api.instance_mappings record would be a candidate for being archived (removed). <snip>
# placement-status upgrade check +-----------------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -2 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------------+
argh! again a negative number! But at least it's only 2, which is well within the realm of manual fixes.
The only theory I have for how this occurred is you have 2 consumers that are orphaned due to missing the nova_cell0 during database archiving ... Like if you have a couple of deleted instances in nova_cell0 and thus still have nova_api.instance_mappings and without --all-cells those instance_mappings didn't get removed and so affected the manual cleanup query you ran (presence of instance_mappings prevented deletion of 2 orphaned consumers). If that's not it, then I'm afraid I don't have any other ideas at the moment. -melanie
Hello, In case it helps anyone else searching for this in future: Melanie's suggestion to clean out the orphaned consumers worked perfectly in my situation. The last two I had were apparently left over from the original build of this environment. I brute-force cleaned them out of the DB manually: DELETE FROM nova_cell0.block_device_mapping WHERE nova_cell0.block_device_mapping.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_faults WHERE nova_cell0.instance_faults.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_extra WHERE nova_cell0.instance_extra.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_info_caches WHERE nova_cell0.instance_info_caches.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_system_metadata WHERE nova_cell0.instance_system_metadata.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instances WHERE nova_cell0.instances.uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); Caveat: I am not intimately familiar with how the ORM handles these DB tables, I may have done something stupid here. I tried to run: nova-manage db archive_deleted_rows --verbose --until-complete --all-cells but nova-db-manage complained that it didn't recognise --no-cells Thanks very much for your help, Melanie Seth On 30/10/2020 16:50, melanie witt wrote:
On 10/30/20 01:37, Seth Tunstall wrote:
Hello,
On 10/28/20 12:01, melanie witt wrote: >> The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
I've done the following in this order:
nova-manage api_db sync
nova-manage db sync
(to bring the DBs up to the version I'm upgrading to (Train)
nova-manage db archive_deleted_rows --verbose --until-complete
The thing I notice here ^ is that you didn't (but should) use --all-cells to also clean up based on the nova_cell0 database (where instances that failed scheduling go). If you've ever had an instance go into ERROR state for failing the scheduling step and you deleted it, its nova_api.instance_mappings record would be a candidate for being archived (removed).
<snip>
# placement-status upgrade check +-----------------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -2 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------------+
argh! again a negative number! But at least it's only 2, which is well within the realm of manual fixes.
The only theory I have for how this occurred is you have 2 consumers that are orphaned due to missing the nova_cell0 during database archiving ... Like if you have a couple of deleted instances in nova_cell0 and thus still have nova_api.instance_mappings and without --all-cells those instance_mappings didn't get removed and so affected the manual cleanup query you ran (presence of instance_mappings prevented deletion of 2 orphaned consumers).
If that's not it, then I'm afraid I don't have any other ideas at the moment.
-melanie
Hello, In case it helps anyone else searching for this in future: Melanie's suggestion to clean out the orphaned consumers worked perfectly in my situation. The last two I had were apparently left over from the original build of this environment. I brute-force cleaned them out of the DB manually: DELETE FROM nova_cell0.block_device_mapping WHERE nova_cell0.block_device_mapping.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_faults WHERE nova_cell0.instance_faults.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_extra WHERE nova_cell0.instance_extra.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_info_caches WHERE nova_cell0.instance_info_caches.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_system_metadata WHERE nova_cell0.instance_system_metadata.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instances WHERE nova_cell0.instances.uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); Caveat: I am not intimately familiar with how the ORM handles these DB tables, I may have done something stupid here. I tried to run: nova-manage db archive_deleted_rows --verbose --until-complete --all-cells but nova-db-manage complained that it didn't recognise --no-cells Thanks very much for your help, Melanie Seth On 30/10/2020 16:50, melanie witt wrote:
On 10/30/20 01:37, Seth Tunstall wrote:
Hello,
On 10/28/20 12:01, melanie witt wrote: >> The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
I've done the following in this order:
nova-manage api_db sync
nova-manage db sync
(to bring the DBs up to the version I'm upgrading to (Train)
nova-manage db archive_deleted_rows --verbose --until-complete
The thing I notice here ^ is that you didn't (but should) use --all-cells to also clean up based on the nova_cell0 database (where instances that failed scheduling go). If you've ever had an instance go into ERROR state for failing the scheduling step and you deleted it, its nova_api.instance_mappings record would be a candidate for being archived (removed).
<snip>
# placement-status upgrade check +-----------------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -2 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------------+
argh! again a negative number! But at least it's only 2, which is well within the realm of manual fixes.
The only theory I have for how this occurred is you have 2 consumers that are orphaned due to missing the nova_cell0 during database archiving ... Like if you have a couple of deleted instances in nova_cell0 and thus still have nova_api.instance_mappings and without --all-cells those instance_mappings didn't get removed and so affected the manual cleanup query you ran (presence of instance_mappings prevented deletion of 2 orphaned consumers).
If that's not it, then I'm afraid I don't have any other ideas at the moment.
-melanie
On 11/4/20 08:54, Seth Tunstall wrote:
Hello,
In case it helps anyone else searching for this in future: Melanie's suggestion to clean out the orphaned consumers worked perfectly in my situation.
The last two I had were apparently left over from the original build of this environment. I brute-force cleaned them out of the DB manually:
DELETE FROM nova_cell0.block_device_mapping WHERE nova_cell0.block_device_mapping.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations));
<snip>
Caveat: I am not intimately familiar with how the ORM handles these DB tables, I may have done something stupid here.
Hm, sorry, this isn't what I was suggesting you do ... I was making a guess that you might have instances with 'deleted' != 0 in your nova_cell0 database and that if so, they needed to be archived using 'nova-manage db archive_deleted_rows' and then that might take care of removing their corresponding nova_api.instance_mappings which would make the manual cleanup find more rows (the rows that were being complained about). What you did is "OK" (not harmful) if the nova_cell0.instances records associated with those records were 'deleted' column != 0. But there's likely more cruft rows left behind that will never be removed. nova-manage db archive_deleted_rows should be used whenever possible because it knows how to remove all the things.
I tried to run:
nova-manage db archive_deleted_rows --verbose --until-complete --all-cells
but nova-db-manage complained that it didn't recognise --no-cells
This is with the train code? --all-cells was added in train [1]. If you are running with code prior to train, you have to pass a nova config file to the nova-manage command that has its [api_database]connection set to the nova_api database connection url and the [database]connection set to the nova_cell0 database. Example: nova-manage --config-file <nova.conf pointing at nova_cell0> db archive_deleted_rows ... Cheers, -melanie [1] https://docs.openstack.org/nova/train/cli/nova-manage.html#nova-database
Hello, Sorry I could not work on this for a while. To fix this issue I just added one request to my previous message. I will write down my entire procedure: use nova; update instance_id_mappings set deleted = id where uuid in (select uuid from instances where deleted != 0); exit nova-manage db archive_deleted_rows --all-cells --until-complete delete from nova.instance_id_mappings where uuid not in (select uuid from nova.instances); delete from nova_api.consumers where nova_api.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); Thus new request: delete from placement.consumers where placement.consumers.uuid not in (select nova_api.instance_mappings.instance_uuid from nova_api.instance_mappings); I already executed the db migrate script so I have to clear placement tables. If you still have negative values I think there are many cases. I faced these: - All instances in a deleted project are still present in placement/nova_api consumers. I removed them from nova_api.instance_mappings before nova-manage db archive_deleted_rows - Last one is weird: A very old shelved instance which appeared after running placement-manage db online_data_migrations Best regards, Romain On Wed, 2020-11-04 at 09:08 -0800, melanie witt wrote:
On 11/4/20 08:54, Seth Tunstall wrote:
Hello,
In case it helps anyone else searching for this in future: Melanie's suggestion to clean out the orphaned consumers worked perfectly in my situation.
The last two I had were apparently left over from the original build of this environment. I brute-force cleaned them out of the DB manually:
DELETE FROM nova_cell0.block_device_mapping WHERE nova_cell0.block_device_mapping.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations));
<snip>
Caveat: I am not intimately familiar with how the ORM handles these DB tables, I may have done something stupid here.
Hm, sorry, this isn't what I was suggesting you do ... I was making a guess that you might have instances with 'deleted' != 0 in your nova_cell0 database and that if so, they needed to be archived using 'nova-manage db archive_deleted_rows' and then that might take care of removing their corresponding nova_api.instance_mappings which would make the manual cleanup find more rows (the rows that were being complained about).
What you did is "OK" (not harmful) if the nova_cell0.instances records associated with those records were 'deleted' column != 0. But there's likely more cruft rows left behind that will never be removed. nova-manage db archive_deleted_rows should be used whenever possible because it knows how to remove all the things.
I tried to run:
nova-manage db archive_deleted_rows --verbose --until-complete -- all-cells
but nova-db-manage complained that it didn't recognise --no-cells
This is with the train code? --all-cells was added in train [1]. If you are running with code prior to train, you have to pass a nova config file to the nova-manage command that has its [api_database]connection set to the nova_api database connection url and the [database]connection set to the nova_cell0 database. Example:
nova-manage --config-file <nova.conf pointing at nova_cell0> db archive_deleted_rows ...
Cheers, -melanie
[1] https://docs.openstack.org/nova/train/cli/nova-manage.html#nova-database
participants (3)
-
CHANU ROMAIN
-
melanie witt
-
Seth Tunstall