[nova] indices on shadow-able fields
Hi everyone. In continuing with the trend of interesting database issues, I've found the following interesting thing. It looks like we don't have indices for all of our tables that can be large on the deleted field. I think it would be beneficial *especially* for cases when the archiving code runs. The big ones that are huge users in terms of rows without index are `instance_system_metadata` and `instance_extra`. `instance_action_events` and `instance_actions` don't have any either. I believe without this, it results in doing a full table scan during the clean up which can become pretty resource intensive. I'll defer to the Nova team on the best practice about this (or if this is even a bug), but I just wanted to bring the attention to this as I've had mild success trying to tame a large database and leveraging the archive DB tooling. Thanks, Mohammed -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
On Mon, Apr 08, 2019 at 10:51:47PM -0400, Mohammed Naser wrote:
Hi everyone.
In continuing with the trend of interesting database issues, I've found the following interesting thing. It looks like we don't have indices for all of our tables that can be large on the deleted field. I think it would be beneficial *especially* for cases when the archiving code runs.
The big ones that are huge users in terms of rows without index are `instance_system_metadata` and `instance_extra`. `instance_action_events` and `instance_actions` don't have any either.
I believe without this, it results in doing a full table scan during the clean up which can become pretty resource intensive.
I'll defer to the Nova team on the best practice about this (or if this is even a bug), but I just wanted to bring the attention to this as I've had mild success trying to tame a large database and leveraging the archive DB tooling.
I don't think we query shadow tables or it's when we want to purge them, so basically that is going to be based on datetime. I may be wrong but adding an index for a datetime field may not be so interesting since the index is going to be quite large because of the hours, minutes, seconds. That needs to be verified the database may provides feature to index datetime using date only. If so I guess you raised a good point, if not, indexing such fields is not going to help and probably going to consum more resources. s.
Thanks, Mohammed
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
On 4/9/2019 10:34 AM, Sahid Orentino Ferdjaoui wrote:
I don't think we query shadow tables or it's when we want to purge them, so basically that is going to be based on datetime. I may be wrong but adding an index for a datetime field may not be so interesting since the index is going to be quite large because of the hours, minutes, seconds. That needs to be verified the database may provides feature to index datetime using date only. If so I guess you raised a good point, if not, indexing such fields is not going to help and probably going to consum more resources.
I don't think he's talking about querying the shadow tables but the queries made on (soft) deleted records in the normal tables to figure out what to populate / move into the shadow tables, e.g.: https://github.com/openstack/nova/blob/d42a007425d9adb691134137e1e0b7dda356d... Which says: - for all soft deleted instances - find me all of their soft deleted actions - and for those actions, soft delete all of the events for each action There was a question in IRC yesterday if the instance_action_events table had an index on action_id because of this query: https://github.com/openstack/nova/blob/d42a007425d9adb691134137e1e0b7dda356d... And Jay said there was because of the ForeignKeyConstraint on that column. There isn't an index on action_id/deleted though, but I'm not seeing where that is being used here. -- Thanks, Matt
On 04/08/2019 10:51 PM, Mohammed Naser wrote:
Hi everyone.
In continuing with the trend of interesting database issues, I've found the following interesting thing. It looks like we don't have indices for all of our tables that can be large on the deleted field. I think it would be beneficial *especially* for cases when the archiving code runs.
The big ones that are huge users in terms of rows without index are `instance_system_metadata` and `instance_extra`. `instance_action_events` and `instance_actions` don't have any either.
I believe without this, it results in doing a full table scan during the clean up which can become pretty resource intensive.
All of the queries that the archive tool performs look at the deleted column in various tables, yes. I suppose adding an index on the deleted column for the largest tables you listed above would provide some relief given the design of the existing archive tool. Once the archive operation is performed, however, all rows in the table will have a deleted column value of 0. This generally makes the deleted column a poor choice for placing an index, since the cardinality of the column will be very low. [0] However, in this case, the index would build up over time (between archive tool runs) and be useful during archiving, and less useful immediately after the archive tool runs. In short, feel free to create indexes on the deleted column for those tables and let us know whether they make a substantial difference in the performance of the archival tool. :) Best, -jay [0] https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
On Thu, Apr 11, 2019 at 9:27 AM Jay Pipes <jaypipes@gmail.com> wrote:
On 04/08/2019 10:51 PM, Mohammed Naser wrote:
Hi everyone.
In continuing with the trend of interesting database issues, I've found the following interesting thing. It looks like we don't have indices for all of our tables that can be large on the deleted field. I think it would be beneficial *especially* for cases when the archiving code runs.
The big ones that are huge users in terms of rows without index are `instance_system_metadata` and `instance_extra`. `instance_action_events` and `instance_actions` don't have any either.
I believe without this, it results in doing a full table scan during the clean up which can become pretty resource intensive.
All of the queries that the archive tool performs look at the deleted column in various tables, yes. I suppose adding an index on the deleted column for the largest tables you listed above would provide some relief given the design of the existing archive tool.
Once the archive operation is performed, however, all rows in the table will have a deleted column value of 0. This generally makes the deleted column a poor choice for placing an index, since the cardinality of the column will be very low. [0] However, in this case, the index would build up over time (between archive tool runs) and be useful during archiving, and less useful immediately after the archive tool runs.
In short, feel free to create indexes on the deleted column for those tables and let us know whether they make a substantial difference in the performance of the archival tool. :)
Hi Jay, Thanks for all the feedback, I appreciate your insight into SQL that's way beyond my "it's slow, index is the fix?!" scope :) I'll try to see if I can get some before/after numbers. Thanks, Mohammed
Best, -jay
[0] https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
participants (4)
-
Jay Pipes
-
Matt Riedemann
-
Mohammed Naser
-
Sahid Orentino Ferdjaoui