[openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

Mike Bayer mbayer at redhat.com
Wed Nov 26 02:34:24 UTC 2014


> On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL <arahal at iweb.com> wrote:
> 
> Hi,
> 
> Le 2014-11-24 17:20, Michael Still a écrit :
>> Heya,
>> 
>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>> 
>> Some initial proposals:
>> 
>>  - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>> 
>>  - we have shadow tables and we move delete rows to a shadow table.
>> 
>>  - something else super clever I haven't thought of.
> 
> Some random thoughts that came to mind ...
> 
> 1/ as far as I remember, you rarely want to delete a row
> - it's usually a heavy DB operation (well, was back then)
> - it's destructive (but we may want that)
> - it creates fragmentation (less of a problem depending on db engine)
> - it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those items are all related and being able to delete everything related is a well-solved problem, both via ON DELETE cascades as well as standard ORM features.


> 
> 2/ updating a row to 'deleted=1'
> - gives an opportunity to set a useful deletion time-stamp
> I would even argue that setting the deleted_at field would suffice to declare a row 'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" …

the logic that’s used is that “deleted” is set to the primary key of the record, this is to allow UNIQUE constraints to be set up that serve on the non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there are multiple “deleted” rows with “x”).

> - the update operation is not destructive
> - an admin/DBA can decide when and how to purge/archive rows
> 
> 3/ moving the row at deletion
> - you want to avoid additional steps to complete an operation, thus avoid creating a new record while deleting one
> - even if you wrap things into a transaction, not being able to create a row somewhere can make your delete transaction fail
> - if I were to archive all deleted rows, at scale I'd probably move them to another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a textual log file, then you archive the files.  There’s no need for a pure “audit trail” to even be in the relational DB.


> Now, I for one would keep the current mark-as-deleted model.
> 
> I however perfectly get the problem of massive churn with instance creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


> So, let's be crazy, why not have a config option 'on_delete=mark_delete', 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those soft-deletes need to be accessible in the application, should be decided up front.  Adding a multiplicity of options just makes the code that much more complicated and fragments its behaviors and test coverage.   The suggestion basically tries to avoid making a decision and I think more thought should be put into what is truly needed.


> This would especially come handy if the admin decides the global cells database may not need to keep track of deleted instances, the cell-local nova database being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted data isn’t needed by the live app, it should just be dumped to an archive.  admins can set how often that archive should be purged, but IMHO the “pipeline” of these records should be straight; there shouldn’t be junctions and switches that cause there to be multiple data paths.   It leads to too much complexity.




More information about the OpenStack-dev mailing list