[openstack-dev] [nova] nova-manage db archive_deleted_rows broken

Matt Riedemann mriedem at linux.vnet.ibm.com
Wed Oct 7 15:04:04 UTC 2015



On 12/12/2014 7:54 PM, melanie witt wrote:
> Hi everybody,
>
> At some point, our db archiving functionality got broken because there was a change to stop ever deleting instance system metadata [1]. For those unfamiliar, the 'nova-manage db archive_deleted_rows' is the thing that moves all soft-deleted (deleted=nonzero) rows to the shadow tables. This is a periodic cleaning that operators can do to maintain performance (as things can get sluggish when deleted=nonzero rows accumulate).
>
> The change was made because instance_type data still needed to be read even after instances had been deleted, because we allow admin to view deleted instances. I saw a bug [2] and two patches [3][4] which aimed to fix this by changing back to soft-deleting instance sysmeta when instances are deleted, and instead allow read_deleted="yes" for the things that need to read instance_type for deleted instances present in the db.
>
> My question is, is this approach okay? If so, I'd like to see these patches revive so we can have our db archiving working again. :) I think there's likely something I'm missing about the approach, so I'm hoping people who know more about instance sysmeta than I do, can chime in on how/if we can fix this for db archiving. Thanks.
>
> [1] https://bugs.launchpad.net/nova/+bug/1185190
> [2] https://bugs.launchpad.net/nova/+bug/1226049
> [3] https://review.openstack.org/#/c/110875/
> [4] https://review.openstack.org/#/c/109201/
>
> melanie (melwitt)
>
>
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

I'd like to bring this back up since even though [3] and [4] are merged, 
nova-manage db archive_deleted_rows still fails to delete rows from some 
tables because of foreign key constraint issues, detailed here:

https://bugs.launchpad.net/nova/+bug/1183523/comments/12

I'm wondering why we don't reverse sort the tables using the sqlalchemy 
metadata object before processing the tables for delete?  That's the 
same thing I did in the 267 migration since we needed to process the 
tree starting with the leafs and then eventually get back to the 
instances table (since most roads lead to the instances table).

Another thing that's really weird is how max_rows is used in this code. 
There is cumulative tracking of the max_rows value so if the value you 
pass in is too small, you might not actually be removing anything.

I figured max_rows meant up to max_rows from each table, not max_rows 
*total* across all tables. By my count, there are 52 tables in the nova 
db model. The way I read the code, if I pass in max_rows=10 and say it 
processes table A and archives 7 rows, then when it processes table B it 
will pass max_rows=(max_rows - rows_archived), which would be 3 for 
table B. If we archive 3 rows from table B, rows_archived >= max_rows 
and we quit. So to really make this work, you have to pass in something 
big for max_rows, like 1000, which seems completely random.

Does this seem odd to anyone else?  Given the relationships between 
tables, I'd think you'd want to try and delete max_rows for all tables, 
so archive 10 instances, 10 block_device_mapping, 10 pci_devices, etc.

I'm also bringing this up now because there is a thread in the operators 
list which pointed me to a set of scripts that operators at GoDaddy are 
using for archiving deleted rows:

http://lists.openstack.org/pipermail/openstack-operators/2015-October/008392.html

Presumably because the command in nova doesn't work. We should either 
make this thing work or just punt and delete it because no one cares.

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list