[openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Joe Gordon joe.gordon0 at gmail.com
Tue Mar 11 18:38:00 UTC 2014


On Tue, Mar 11, 2014 at 10:24 AM, Mike Wilson <geekinutah at gmail.com> wrote:

> Undeleting things is an important use case in my opinion. We do this in
> our environment on a regular basis. In that light I'm not sure that it
> would be appropriate just to log the deletion and git rid of the row. I
> would like to see it go to an archival table where it is easily restored.
>
>
Although we want to *support* hard deletion, we still want to support the
current behavior as well (Soft deletion, where the operator, can prune
deleted rows periodically).


> -Mike
>
>
> On Mon, Mar 10, 2014 at 3:44 PM, Joshua Harlow <harlowja at yahoo-inc.com>wrote:
>
>>  Sounds like a good idea to me.
>>
>>  I've never understood why we treat the DB as a LOG (keeping deleted ==
>> 0 records around) when we should just use a LOG (or similar system) to
>> begin with instead.
>>
>>  Does anyone use the feature of switching deleted == 1 back to deleted =
>> 0? Has this worked out for u?
>>
>>  Seems like some of the feedback on
>> https://etherpad.openstack.org/p/operators-feedback-mar14 also suggests
>> that this has been a operational pain-point for folks (Tool to delete
>> things properly suggestions and such...).
>>
>>   From: Boris Pavlovic <bpavlovic at mirantis.com>
>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Date: Monday, March 10, 2014 at 1:29 PM
>> To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org>,
>> Victor Sergeyev <vsergeyev at mirantis.com>
>> Subject: [openstack-dev] [all][db][performance] Proposal: Get rid of
>> soft deletion (step by step)
>>
>>   Hi stackers,
>>
>>  (It's proposal for Juno.)
>>
>>  Intro:
>>
>>  Soft deletion means that records from DB are not actually deleted, they
>> are just marked as a "deleted". To mark record as a "deleted" we put in
>> special table's column "deleted" record's ID value.
>>
>>  Issue 1: Indexes & Queries
>> We have to add in every query "AND deleted == 0" to get non-deleted
>> records.
>> It produce performance issue, cause we should add it in any index one
>> "extra" column.
>> As well it produce extra complexity in db migrations and building
>> queries.
>>
>>  Issue 2: Unique constraints
>> Why we store ID in deleted and not True/False?
>>  The reason is that we would like to be able to create real DB unique
>> constraints and avoid race conditions on "insert" operation.
>>
>>  Sample: we Have table (id, name, password, deleted) we would like to
>> put in column "name" only unique value.
>>
>>  Approach without UC: if count(`select .... where name = name`) == 0:
>> insert(...)
>> (race cause we are able to add new record between )
>>
>>  Approach with UC: try: insert(...) except Duplicate: ...
>>
>>  So to add UC we have to add them on (name, deleted). (to be able to
>> make insert/delete/insert with same name)
>>
>>  As well it produce performance issues, because we have to use Complex
>> unique constraints on 2  or more columns. + extra code & complexity in db
>> migrations.
>>
>>  Issue 3: Garbage collector
>>
>>  It is really hard to make garbage collector that will have good
>> performance and be enough common to work in any case for any project.
>> Without garbage collector DevOps have to cleanup records by hand, (risk
>> to break something). If they don't cleanup DB they will get very soon
>> performance issue.
>>
>>  To put in a nutshell most important issues:
>> 1) Extra complexity to each select query & extra column in each index
>> 2) Extra column in each Unique Constraint (worse performance)
>> 3) 2 Extra column in each table: (deleted, deleted_at)
>> 4) Common garbage collector is required
>>
>>
>>  To resolve all these issues we should just remove soft deletion.
>>
>>  One of approaches that I see is in step by step removing "deleted"
>> column from every table with probably code refactoring.  Actually we have 3
>> different cases:
>>
>>  1) We don't use soft deleted records:
>> 1.1) Do .delete() instead of .soft_delete()
>> 1.2) Change query to avoid adding extra "deleted == 0" to each query
>> 1.3) Drop "deleted" and "deleted_at" columns
>>
>>  2) We use soft deleted records for internal stuff "e.g. periodic tasks"
>> 2.1) Refactor code somehow: E.g. store all required data by periodic task
>> in some special table that has: (id, type, json_data) columns
>> 2.2) On delete add record to this table
>> 2.3-5) similar to 1.1, 1.2, 13
>>
>>  3) We use soft deleted records in API
>> 3.1) Deprecated API call if it is possible
>> 3.2) Make proxy call to ceilometer from API
>> 3.3) On .delete() store info about records in (ceilometer, or somewhere
>> else)
>> 3.4-6) similar to 1.1, 1.2, 1.3
>>
>> This is not ready RoadMap, just base thoughts to start the constructive
>> discussion in the mailing list, so %stacker% your opinion is very
>> important!
>>
>>
>>  Best regards,
>> Boris Pavlovic
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140311/41367387/attachment.html>


More information about the OpenStack-dev mailing list