[openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Boris Pavlovic bpavlovic at mirantis.com
Mon Mar 10 21:29:59 UTC 2014


Hi stackers,

(It's proposal for Juno.)

Intro:

Soft deletion means that records from DB are not actually deleted, they are
just marked as a "deleted". To mark record as a "deleted" we put in special
table's column "deleted" record's ID value.

Issue 1: Indexes & Queries
We have to add in every query "AND deleted == 0" to get non-deleted records.
It produce performance issue, cause we should add it in any index one
"extra" column.
As well it produce extra complexity in db migrations and building queries.

Issue 2: Unique constraints
Why we store ID in deleted and not True/False?
The reason is that we would like to be able to create real DB unique
constraints and avoid race conditions on "insert" operation.

Sample: we Have table (id, name, password, deleted) we would like to put in
column "name" only unique value.

Approach without UC: if count(`select .... where name = name`) == 0:
insert(...)
(race cause we are able to add new record between )

Approach with UC: try: insert(...) except Duplicate: ...

So to add UC we have to add them on (name, deleted). (to be able to make
insert/delete/insert with same name)

As well it produce performance issues, because we have to use Complex
unique constraints on 2  or more columns. + extra code & complexity in db
migrations.

Issue 3: Garbage collector

It is really hard to make garbage collector that will have good performance
and be enough common to work in any case for any project.
Without garbage collector DevOps have to cleanup records by hand, (risk to
break something). If they don't cleanup DB they will get very soon
performance issue.

To put in a nutshell most important issues:
1) Extra complexity to each select query & extra column in each index
2) Extra column in each Unique Constraint (worse performance)
3) 2 Extra column in each table: (deleted, deleted_at)
4) Common garbage collector is required


To resolve all these issues we should just remove soft deletion.

One of approaches that I see is in step by step removing "deleted" column
from every table with probably code refactoring.  Actually we have 3
different cases:

1) We don't use soft deleted records:
1.1) Do .delete() instead of .soft_delete()
1.2) Change query to avoid adding extra "deleted == 0" to each query
1.3) Drop "deleted" and "deleted_at" columns

2) We use soft deleted records for internal stuff "e.g. periodic tasks"
2.1) Refactor code somehow: E.g. store all required data by periodic task
in some special table that has: (id, type, json_data) columns
2.2) On delete add record to this table
2.3-5) similar to 1.1, 1.2, 13

3) We use soft deleted records in API
3.1) Deprecated API call if it is possible
3.2) Make proxy call to ceilometer from API
3.3) On .delete() store info about records in (ceilometer, or somewhere
else)
3.4-6) similar to 1.1, 1.2, 1.3

This is not ready RoadMap, just base thoughts to start the constructive
discussion in the mailing list, so %stacker% your opinion is very
important!


Best regards,
Boris Pavlovic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140311/5a47b896/attachment.html>


More information about the OpenStack-dev mailing list