[openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

Michael Still mikal at stillhq.com
Tue Nov 25 00:32:40 UTC 2014

On Tue, Nov 25, 2014 at 11:14 AM, Mike Bayer <mbayer at redhat.com> wrote:
>> On Nov 24, 2014, at 5:20 PM, Michael Still <mikal at stillhq.com> wrote:
>> Heya,
>> Review https://review.openstack.org/#/c/135644/4 proposes the addition
>> of a new database for our improved implementation of cells in Nova.
>> However, there's an outstanding question about how to handle soft
>> delete of rows -- we believe that we need to soft delete for forensic
>> purposes.
> Everytime I talk to people about the soft delete thing, I hear the usual refrain “we thought we needed it, but we didn’t and now it’s just overbuilt cruft we want to get rid of”.
> Not saying you don’t have a need here but you definitely have this need, not just following the herd right?   Soft delete makes things a lot less convenient.

I can't comment on other projects, but Nova definitely needs the soft
delete in the main nova database. Perhaps not for every table, but
there is definitely code in the code base which uses it right now.
Search for read_deleted=True if you're curious.

For this case in particular, the concern is that operators might need
to find where an instance was running once it is deleted to be able to
diagnose issues reported by users. I think that's a valid use case of
this particular data.

>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>> Some initial proposals:
>> - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>> - we have shadow tables and we move delete rows to a shadow table.
> Both approaches are viable, but as the soft-delete column is widespread, it would be thorny for this new app to use some totally different scheme, unless the notion is that all schemes should move to the audit table approach (which I wouldn’t mind, but it would be a big job).    FTR, the audit table approach is usually what I prefer for greenfield development, if all that’s needed is forensic capabilities at the database inspection level, and not as much active GUI-based “deleted” flags.   That is, if you really don’t need to query the history tables very often except when debugging an issue offline.  The reason its preferable is because those rows are still “deleted” from your main table, and they don’t get in the way of querying.   But if you need to refer to these history rows in context of the application, that means you need to get them mapped in such a way that they behave like the primary rows, which overall is a more difficult approach than just using the soft delete column.
> That said, I have a lot of plans to send improvements down the way of the existing approach of “soft delete column” into projects, from the querying POV, so that criteria to filter out soft delete can be done in a much more robust fashion (see https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).   But this is still more complex and less performant than if the rows are just gone totally, off in a history table somewhere (again, provided you really don’t need to look at those history rows in an application context, otherwise it gets all complicated again).

Interesting. I hadn't seen consistency between the two databases as
trumping doing this less horribly, but it sounds like its more of a
thing that I thought.


Rackspace Australia

More information about the OpenStack-dev mailing list