[openstack-dev] [nova] nova-manage db archive_deleted_rows broken
mriedem at linux.vnet.ibm.com
Wed Oct 7 15:04:04 UTC 2015
On 12/12/2014 7:54 PM, melanie witt wrote:
> Hi everybody,
> At some point, our db archiving functionality got broken because there was a change to stop ever deleting instance system metadata . For those unfamiliar, the 'nova-manage db archive_deleted_rows' is the thing that moves all soft-deleted (deleted=nonzero) rows to the shadow tables. This is a periodic cleaning that operators can do to maintain performance (as things can get sluggish when deleted=nonzero rows accumulate).
> The change was made because instance_type data still needed to be read even after instances had been deleted, because we allow admin to view deleted instances. I saw a bug  and two patches  which aimed to fix this by changing back to soft-deleting instance sysmeta when instances are deleted, and instead allow read_deleted="yes" for the things that need to read instance_type for deleted instances present in the db.
> My question is, is this approach okay? If so, I'd like to see these patches revive so we can have our db archiving working again. :) I think there's likely something I'm missing about the approach, so I'm hoping people who know more about instance sysmeta than I do, can chime in on how/if we can fix this for db archiving. Thanks.
>  https://bugs.launchpad.net/nova/+bug/1185190
>  https://bugs.launchpad.net/nova/+bug/1226049
>  https://review.openstack.org/#/c/110875/
>  https://review.openstack.org/#/c/109201/
> melanie (melwitt)
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
I'd like to bring this back up since even though  and  are merged,
nova-manage db archive_deleted_rows still fails to delete rows from some
tables because of foreign key constraint issues, detailed here:
I'm wondering why we don't reverse sort the tables using the sqlalchemy
metadata object before processing the tables for delete? That's the
same thing I did in the 267 migration since we needed to process the
tree starting with the leafs and then eventually get back to the
instances table (since most roads lead to the instances table).
Another thing that's really weird is how max_rows is used in this code.
There is cumulative tracking of the max_rows value so if the value you
pass in is too small, you might not actually be removing anything.
I figured max_rows meant up to max_rows from each table, not max_rows
*total* across all tables. By my count, there are 52 tables in the nova
db model. The way I read the code, if I pass in max_rows=10 and say it
processes table A and archives 7 rows, then when it processes table B it
will pass max_rows=(max_rows - rows_archived), which would be 3 for
table B. If we archive 3 rows from table B, rows_archived >= max_rows
and we quit. So to really make this work, you have to pass in something
big for max_rows, like 1000, which seems completely random.
Does this seem odd to anyone else? Given the relationships between
tables, I'd think you'd want to try and delete max_rows for all tables,
so archive 10 instances, 10 block_device_mapping, 10 pci_devices, etc.
I'm also bringing this up now because there is a thread in the operators
list which pointed me to a set of scripts that operators at GoDaddy are
using for archiving deleted rows:
Presumably because the command in nova doesn't work. We should either
make this thing work or just punt and delete it because no one cares.
More information about the OpenStack-dev