<div dir="ltr">Hi stackers, <div><br></div><div>(It's proposal for Juno.)</div><div><br></div><div>Intro:</div><div><br></div><div>Soft deletion means that records from DB are not actually deleted, they are just marked as a "deleted". To mark record as a "deleted" we put in special table's column "deleted" record's ID value. </div>
<div><br></div><div>Issue 1: Indexes & Queries</div><div>We have to add in every query "AND deleted == 0" to get non-deleted records.</div><div>It produce performance issue, cause we should add it in any index one "extra" column. </div>
<div>As well it produce extra complexity in db migrations and building queries. </div><div><br></div><div>Issue 2: Unique constraints </div><div>Why we store ID in deleted and not True/False? <br></div><div>The reason is that we would like to be able to create real DB unique constraints and avoid race conditions on "insert" operation. </div>
<div><br></div><div>Sample: we Have table (id, name, password, deleted) we would like to put in column "name" only unique value. </div><div><br></div><div>Approach without UC: if count(`select .... where name = name`) == 0: insert(...)</div>
<div>(race cause we are able to add new record between )</div><div><br></div><div>Approach with UC: try: insert(...) except Duplicate: ...</div><div><br></div><div>So to add UC we have to add them on (name, deleted). (to be able to make insert/delete/insert with same name)</div>
<div><br></div><div>As well it produce performance issues, because we have to use Complex unique constraints on 2 or more columns. + extra code & complexity in db migrations. </div><div><br></div><div>Issue 3: Garbage collector</div>
<div><br></div><div>It is really hard to make garbage collector that will have good performance and be enough common to work in any case for any project. </div><div>Without garbage collector DevOps have to cleanup records by hand, (risk to break something). If they don't cleanup DB they will get very soon performance issue. </div>
<div><br></div><div>To put in a nutshell most important issues:</div><div>1) Extra complexity to each select query & extra column in each index</div><div>2) Extra column in each Unique Constraint (worse performance)</div>
<div>3) 2 Extra column in each table: (deleted, deleted_at)</div><div>4) Common garbage collector is required</div><div><br></div><div><br></div><div>To resolve all these issues we should just remove soft deletion.</div><div>
<br></div><div>One of approaches that I see is in step by step removing "deleted" column from every table with probably code refactoring. Actually we have 3 different cases:</div><div><br></div><div>1) We don't use soft deleted records: </div>
<div>1.1) Do .delete() instead of .soft_delete()</div><div>1.2) Change query to avoid adding extra "deleted == 0" to each query</div><div>1.3) Drop "deleted" and "deleted_at" columns </div><div>
<br></div><div>2) We use soft deleted records for internal stuff "e.g. periodic tasks"</div><div>2.1) Refactor code somehow: E.g. store all required data by periodic task in some special table that has: (id, type, json_data) columns</div>
<div>2.2) On delete add record to this table </div><div>2.3-5) similar to 1.1, 1.2, 13</div><div><br></div><div>3) We use soft deleted records in API </div><div>3.1) Deprecated API call if it is possible </div><div>3.2) Make proxy call to ceilometer from API </div>
<div>3.3) On .delete() store info about records in (ceilometer, or somewhere else) </div><div>3.4-6) similar to 1.1, 1.2, 1.3</div><div> </div><div>This is not ready RoadMap, just base thoughts to start the constructive discussion in the mailing list, so %stacker% your opinion is very important! </div>
<div><br></div><div><br></div><div>Best regards,</div>
<div>Boris Pavlovic </div></div>