[openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Joshua Harlow harlowja at yahoo-inc.com
Mon Mar 10 21:44:34 UTC 2014


Sounds like a good idea to me.

I've never understood why we treat the DB as a LOG (keeping deleted == 0 records around) when we should just use a LOG (or similar system) to begin with instead.

Does anyone use the feature of switching deleted == 1 back to deleted = 0? Has this worked out for u?

Seems like some of the feedback on https://etherpad.openstack.org/p/operators-feedback-mar14 also suggests that this has been a operational pain-point for folks (Tool to delete things properly suggestions and such…).

From: Boris Pavlovic <bpavlovic at mirantis.com<mailto:bpavlovic at mirantis.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Monday, March 10, 2014 at 1:29 PM
To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>, Victor Sergeyev <vsergeyev at mirantis.com<mailto:vsergeyev at mirantis.com>>
Subject: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Hi stackers,

(It's proposal for Juno.)

Intro:

Soft deletion means that records from DB are not actually deleted, they are just marked as a "deleted". To mark record as a "deleted" we put in special table's column "deleted" record's ID value.

Issue 1: Indexes & Queries
We have to add in every query "AND deleted == 0" to get non-deleted records.
It produce performance issue, cause we should add it in any index one "extra" column.
As well it produce extra complexity in db migrations and building queries.

Issue 2: Unique constraints
Why we store ID in deleted and not True/False?
The reason is that we would like to be able to create real DB unique constraints and avoid race conditions on "insert" operation.

Sample: we Have table (id, name, password, deleted) we would like to put in column "name" only unique value.

Approach without UC: if count(`select .... where name = name`) == 0: insert(...)
(race cause we are able to add new record between )

Approach with UC: try: insert(...) except Duplicate: ...

So to add UC we have to add them on (name, deleted). (to be able to make insert/delete/insert with same name)

As well it produce performance issues, because we have to use Complex unique constraints on 2  or more columns. + extra code & complexity in db migrations.

Issue 3: Garbage collector

It is really hard to make garbage collector that will have good performance and be enough common to work in any case for any project.
Without garbage collector DevOps have to cleanup records by hand, (risk to break something). If they don't cleanup DB they will get very soon performance issue.

To put in a nutshell most important issues:
1) Extra complexity to each select query & extra column in each index
2) Extra column in each Unique Constraint (worse performance)
3) 2 Extra column in each table: (deleted, deleted_at)
4) Common garbage collector is required


To resolve all these issues we should just remove soft deletion.

One of approaches that I see is in step by step removing "deleted" column from every table with probably code refactoring.  Actually we have 3 different cases:

1) We don't use soft deleted records:
1.1) Do .delete() instead of .soft_delete()
1.2) Change query to avoid adding extra "deleted == 0" to each query
1.3) Drop "deleted" and "deleted_at" columns

2) We use soft deleted records for internal stuff "e.g. periodic tasks"
2.1) Refactor code somehow: E.g. store all required data by periodic task in some special table that has: (id, type, json_data) columns
2.2) On delete add record to this table
2.3-5) similar to 1.1, 1.2, 13

3) We use soft deleted records in API
3.1) Deprecated API call if it is possible
3.2) Make proxy call to ceilometer from API
3.3) On .delete() store info about records in (ceilometer, or somewhere else)
3.4-6) similar to 1.1, 1.2, 1.3

This is not ready RoadMap, just base thoughts to start the constructive discussion in the mailing list, so %stacker% your opinion is very important!


Best regards,
Boris Pavlovic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140310/e33535e9/attachment.html>


More information about the OpenStack-dev mailing list