[openstack-dev] [nova] Question about fixing missing soft deleted rows

Clint Byrum clint at fewbar.com
Thu Sep 15 07:43:11 UTC 2016


Excerpts from Matt Riedemann's message of 2016-09-14 20:21:09 -0500:
> I'm looking for other input on a question I have in this change:
> 
> https://review.openstack.org/#/c/345191/4/nova/db/sqlalchemy/api.py
> 
> We've had a few patches like this where we don't (soft) delete entries 
> related to an instance when that instance record is (soft) deleted. 
> These then cause the archive command to fail because of the referential 
> constraint.
> 
> Then we go in and add a new entry in the instance_destroy method so we 
> start (soft) deleting *new* things, but we don't cleanup anything old.
> 
> In the change above this is working around the fact we might have 
> lingering consoles entries for an instance that's being archived.
> 
> One suggestion I made was adding a database migration that soft deletes 
> any console entries where the related instance is deleted (deleted != 
> 0). Is that a bad idea? It's not a schema migration, it's data cleanup 
> so archive works. We could do the same thing with a nova-manage command, 
> but we don't know that someone has run it like they do with the DB 
> migrations.
> 
> Another idea is doing it in the nova-manage db online_data_migrations 
> command which should be run on upgrade. If we landed something like that 
> in say Ocata, then we could remove the TODO in the archive code in Pike.

In a former life doing highly scalable MySQL, we ditched all of the FK
checks because they were just extra work on write. Instead we introduced
workers that would walk tables and apply rules. They'd do something like
this:

SELECT * FROM book WHERE id > ? ORDER BY id LIMIT 10

And then check those 10 records for any referential integrity issues. If
they found a true orphan like you describe, they'd archive it and move
on. These workers would also sleep a bit between queries (usually about
half as long as the last query took) so they were never a constant drain
on the database. Then after sleeping, the worker takes the last id, and
passes it in. So it basically walks the table by id. If it ever gets
less than 10 records, it goes back to the minimum ID.

Doing this with many million record tables was quite effective,
generally most rows that were archived by something like this were
created by manual manipulation of the database, or legitimate bugs in
the software.

The benefit of doing this is you get to choose how much write and read
capacity you want to commit to consistency.

So, a thought is, rather than one-shot db migration script things, a
worker could be written that just crawls around the various project
databases and reports on, or fixes, known issues.
> 
> Other thoughts?
> 



More information about the OpenStack-dev mailing list