Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"
On 10/22/2018 11:25 AM, Sergio A. de Carvalho Jr. wrote:
While troubleshooting a production issue we identified that the Nova metadata API is fetching a lot more raw data from the database than seems necessary. The problem appears to be caused by the SQL query used to fetch instance data that joins the "instance" table with, among others, two metadata tables: "instance_metadata" and "instance_system_metadata". Below is a simplified version of this query (I've added the full query at the end of this message for reference):
Coming back on this thread [1], I've got a partial fix up which I'm hoping will help: https://review.openstack.org/#/c/624778/ That will avoid joining on some other tables depending on your configuration. It would be great if you could see if that helps resolve your issue. I think you just reverted https://review.openstack.org/#/c/276861/ as a workaround but it would be good to know if a more permanent fix (mine) gets you similar, or at least satisfactory, results. [1] http://lists.openstack.org/pipermail/openstack-dev/2018-October/thread.html#... -- Thanks, Matt
On 12/12/2018 1:18 PM, Matt Riedemann wrote:
Coming back on this thread [1], I've got a partial fix up which I'm hoping will help:
https://review.openstack.org/#/c/624778/
That will avoid joining on some other tables depending on your configuration. It would be great if you could see if that helps resolve your issue. I think you just reverted https://review.openstack.org/#/c/276861/ as a workaround but it would be good to know if a more permanent fix (mine) gets you similar, or at least satisfactory, results.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-October/thread.html#...
I have abandoned that change since it turns out that we need to join on the instance_system_metadata table to get the instance password which is retrieved from a base metadata request. Otherwise you can see the failures here [1]. So either we need to: * Optimize the instance get DB query and joins we do. Dan was looking at this but it was non-trivial. * Reconsider how we store the instance password so it's not in the instance_system_metadata table. Or deployments can aggressively cache the metadata API responses (or scale out the number of metadata API workers) to try and deal with load. [1] http://logs.openstack.org/78/624778/1/check/tempest-full/8d3c124/controller/... -- Thanks, Matt
On 12/17/2018 06:15 PM, Matt Riedemann wrote:
On 12/12/2018 1:18 PM, Matt Riedemann wrote:
Coming back on this thread [1], I've got a partial fix up which I'm hoping will help:
https://review.openstack.org/#/c/624778/
That will avoid joining on some other tables depending on your configuration. It would be great if you could see if that helps resolve your issue. I think you just reverted https://review.openstack.org/#/c/276861/ as a workaround but it would be good to know if a more permanent fix (mine) gets you similar, or at least satisfactory, results.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-October/thread.html#...
I have abandoned that change since it turns out that we need to join on the instance_system_metadata table to get the instance password which is retrieved from a base metadata request. Otherwise you can see the failures here [1].
So either we need to:
* Optimize the instance get DB query and joins we do. Dan was looking at this but it was non-trivial.
* Reconsider how we store the instance password so it's not in the instance_system_metadata table.
Or deployments can aggressively cache the metadata API responses (or scale out the number of metadata API workers) to try and deal with load.
[1] http://logs.openstack.org/78/624778/1/check/tempest-full/8d3c124/controller/...
Well, technically, we *could* do all three of the above, right? :) It's not an either/or situation, AFAIU. From looking at your patches, I think the long-term solution to this would be to stop storing the instance password in the instance_system_metadata table, but from looking into those code paths (eww...) I realize that would be a huge chunk of tech debt refactoring. Maybe something to line up for early Train? Best, -jay
On 12/18/2018 6:48 AM, Jay Pipes wrote:
Well, technically, we *could* do all three of the above, right? :) It's not an either/or situation, AFAIU.
True.
From looking at your patches, I think the long-term solution to this would be to stop storing the instance password in the instance_system_metadata table, but from looking into those code paths (eww...) I realize that would be a huge chunk of tech debt refactoring. Maybe something to line up for early Train?
I would think we'd do an online data migration on access of a new Instance.password field. When building new instances or setting the instance password via the API, we'd set that field rather than in instance.system_metadata. When accessing the Instance.password field, if it's not set, we'd lazy-load and pop it from system_metadata, set it on the Instance.password field and save() that change. That's the model we have used for migrating things like instance.flavor and instance.keypairs. We'd still be stuck with my patch to the metadata API that conditionally joins system_metadata if vendordata_providers are configured, so anyone using those probably doesn't get the performance benefit anyway. It's hard to assess the benefit of prioritizing work on any of this without more operators coming forward and saying, "yes this is definitely a specific pain for us running the metadata-api", especially since the pre-loading of metadata/system_metadata in the API happened back in Mitaka. -- Thanks, Matt
participants (2)
-
Jay Pipes
-
Matt Riedemann