[openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Roman Podoliaka rpodolyaka at mirantis.com
Mon Mar 10 22:04:06 UTC 2014


Hi all,

>>> I've never understood why we treat the DB as a LOG (keeping deleted == 0 records around) when we should just use a LOG (or similar system) to begin with instead.

I can't agree more with you! Storing deleted records in tables is
hardly usable, bad for performance (as it makes tables and indexes
larger) and it probably covers a very limited set of use cases (if
any) of OpenStack users.

>>> One of approaches that I see is in step by step removing "deleted" column from every table with probably code refactoring.

So we have a homework to do: find out what for projects use
soft-deletes. I assume that soft-deletes are only used internally and
aren't exposed to API users, but let's check that. At the same time
all new projects should avoid using of soft-deletes from the start.

Thanks,
Roman

On Mon, Mar 10, 2014 at 2:44 PM, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> Sounds like a good idea to me.
>
> I've never understood why we treat the DB as a LOG (keeping deleted == 0
> records around) when we should just use a LOG (or similar system) to begin
> with instead.
>
> Does anyone use the feature of switching deleted == 1 back to deleted = 0?
> Has this worked out for u?
>
> Seems like some of the feedback on
> https://etherpad.openstack.org/p/operators-feedback-mar14 also suggests that
> this has been a operational pain-point for folks (Tool to delete things
> properly suggestions and such…).
>
> From: Boris Pavlovic <bpavlovic at mirantis.com>
> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev at lists.openstack.org>
> Date: Monday, March 10, 2014 at 1:29 PM
> To: OpenStack Development Mailing List <openstack-dev at lists.openstack.org>,
> Victor Sergeyev <vsergeyev at mirantis.com>
> Subject: [openstack-dev] [all][db][performance] Proposal: Get rid of soft
> deletion (step by step)
>
> Hi stackers,
>
> (It's proposal for Juno.)
>
> Intro:
>
> Soft deletion means that records from DB are not actually deleted, they are
> just marked as a "deleted". To mark record as a "deleted" we put in special
> table's column "deleted" record's ID value.
>
> Issue 1: Indexes & Queries
> We have to add in every query "AND deleted == 0" to get non-deleted records.
> It produce performance issue, cause we should add it in any index one
> "extra" column.
> As well it produce extra complexity in db migrations and building queries.
>
> Issue 2: Unique constraints
> Why we store ID in deleted and not True/False?
> The reason is that we would like to be able to create real DB unique
> constraints and avoid race conditions on "insert" operation.
>
> Sample: we Have table (id, name, password, deleted) we would like to put in
> column "name" only unique value.
>
> Approach without UC: if count(`select .... where name = name`) == 0:
> insert(...)
> (race cause we are able to add new record between )
>
> Approach with UC: try: insert(...) except Duplicate: ...
>
> So to add UC we have to add them on (name, deleted). (to be able to make
> insert/delete/insert with same name)
>
> As well it produce performance issues, because we have to use Complex unique
> constraints on 2  or more columns. + extra code & complexity in db
> migrations.
>
> Issue 3: Garbage collector
>
> It is really hard to make garbage collector that will have good performance
> and be enough common to work in any case for any project.
> Without garbage collector DevOps have to cleanup records by hand, (risk to
> break something). If they don't cleanup DB they will get very soon
> performance issue.
>
> To put in a nutshell most important issues:
> 1) Extra complexity to each select query & extra column in each index
> 2) Extra column in each Unique Constraint (worse performance)
> 3) 2 Extra column in each table: (deleted, deleted_at)
> 4) Common garbage collector is required
>
>
> To resolve all these issues we should just remove soft deletion.
>
> One of approaches that I see is in step by step removing "deleted" column
> from every table with probably code refactoring.  Actually we have 3
> different cases:
>
> 1) We don't use soft deleted records:
> 1.1) Do .delete() instead of .soft_delete()
> 1.2) Change query to avoid adding extra "deleted == 0" to each query
> 1.3) Drop "deleted" and "deleted_at" columns
>
> 2) We use soft deleted records for internal stuff "e.g. periodic tasks"
> 2.1) Refactor code somehow: E.g. store all required data by periodic task in
> some special table that has: (id, type, json_data) columns
> 2.2) On delete add record to this table
> 2.3-5) similar to 1.1, 1.2, 13
>
> 3) We use soft deleted records in API
> 3.1) Deprecated API call if it is possible
> 3.2) Make proxy call to ceilometer from API
> 3.3) On .delete() store info about records in (ceilometer, or somewhere
> else)
> 3.4-6) similar to 1.1, 1.2, 1.3
>
> This is not ready RoadMap, just base thoughts to start the constructive
> discussion in the mailing list, so %stacker% your opinion is very important!
>
>
> Best regards,
> Boris Pavlovic
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list