Hello, In case it helps anyone else searching for this in future: Melanie's suggestion to clean out the orphaned consumers worked perfectly in my situation. The last two I had were apparently left over from the original build of this environment. I brute-force cleaned them out of the DB manually: DELETE FROM nova_cell0.block_device_mapping WHERE nova_cell0.block_device_mapping.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_faults WHERE nova_cell0.instance_faults.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_extra WHERE nova_cell0.instance_extra.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_info_caches WHERE nova_cell0.instance_info_caches.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instance_system_metadata WHERE nova_cell0.instance_system_metadata.instance_uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); DELETE FROM nova_cell0.instances WHERE nova_cell0.instances.uuid IN (SELECT uuid FROM nova_api.consumers WHERE nova_api.consumers.uuid NOT IN (SELECT nova_api.allocations.consumer_id FROM nova_api.allocations)); Caveat: I am not intimately familiar with how the ORM handles these DB tables, I may have done something stupid here. I tried to run: nova-manage db archive_deleted_rows --verbose --until-complete --all-cells but nova-db-manage complained that it didn't recognise --no-cells Thanks very much for your help, Melanie Seth On 30/10/2020 16:50, melanie witt wrote:
On 10/30/20 01:37, Seth Tunstall wrote:
Hello,
On 10/28/20 12:01, melanie witt wrote: >> The main idea of the row deletions is to delete "orphan" records which are records tied to an instance's lifecycle when that instance no longer exists. Going forward, nova will delete these records itself at instance deletion time but did not in the past because of bugs, and any records generated before a bug was fixed will become orphaned once the associated instance is deleted.
I've done the following in this order:
nova-manage api_db sync
nova-manage db sync
(to bring the DBs up to the version I'm upgrading to (Train)
nova-manage db archive_deleted_rows --verbose --until-complete
The thing I notice here ^ is that you didn't (but should) use --all-cells to also clean up based on the nova_cell0 database (where instances that failed scheduling go). If you've ever had an instance go into ERROR state for failing the scheduling step and you deleted it, its nova_api.instance_mappings record would be a candidate for being archived (removed).
<snip>
# placement-status upgrade check +-----------------------------------------------------------------------+ | Upgrade Check Results | +-----------------------------------------------------------------------+ | Check: Missing Root Provider IDs | | Result: Success | | Details: None | +-----------------------------------------------------------------------+ | Check: Incomplete Consumers | | Result: Warning | | Details: There are -2 incomplete consumers table records for existing | | allocations. Run the "placement-manage db | | online_data_migrations" command. | +-----------------------------------------------------------------------+
argh! again a negative number! But at least it's only 2, which is well within the realm of manual fixes.
The only theory I have for how this occurred is you have 2 consumers that are orphaned due to missing the nova_cell0 during database archiving ... Like if you have a couple of deleted instances in nova_cell0 and thus still have nova_api.instance_mappings and without --all-cells those instance_mappings didn't get removed and so affected the manual cleanup query you ran (presence of instance_mappings prevented deletion of 2 orphaned consumers).
If that's not it, then I'm afraid I don't have any other ideas at the moment.
-melanie