On Thu, Apr 11, 2019 at 9:27 AM Jay Pipes <jaypipes@gmail.com> wrote:
On 04/08/2019 10:51 PM, Mohammed Naser wrote:
Hi everyone.
In continuing with the trend of interesting database issues, I've found the following interesting thing. It looks like we don't have indices for all of our tables that can be large on the deleted field. I think it would be beneficial *especially* for cases when the archiving code runs.
The big ones that are huge users in terms of rows without index are `instance_system_metadata` and `instance_extra`. `instance_action_events` and `instance_actions` don't have any either.
I believe without this, it results in doing a full table scan during the clean up which can become pretty resource intensive.
All of the queries that the archive tool performs look at the deleted column in various tables, yes. I suppose adding an index on the deleted column for the largest tables you listed above would provide some relief given the design of the existing archive tool.
Once the archive operation is performed, however, all rows in the table will have a deleted column value of 0. This generally makes the deleted column a poor choice for placing an index, since the cardinality of the column will be very low. [0] However, in this case, the index would build up over time (between archive tool runs) and be useful during archiving, and less useful immediately after the archive tool runs.
In short, feel free to create indexes on the deleted column for those tables and let us know whether they make a substantial difference in the performance of the archival tool. :)
Hi Jay, Thanks for all the feedback, I appreciate your insight into SQL that's way beyond my "it's slow, index is the fix?!" scope :) I'll try to see if I can get some before/after numbers. Thanks, Mohammed
Best, -jay
[0] https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com