[nova][telemetry] does Telemetry still use the Nova server usage audit log API?
Howdy all, TL;DR: I have a question, does the Telemetry service (or any other service) still make use of the server usage audit log API in Nova [1]? Recently I was investigating customer issues where the nova.task_log database table grows infinitely and is never cleaned up [2]. I asked about it today in #openstack-nova [3] and Matt Riedemann explained that the API is toggled via config option [4] and that the Telemetry service is/was the consumer of the API. I found through code inspection that there are no methods for deleting nova.task_log records and am trying to determine what is the best way forward for handling cleanup. Matt mentioned the possibility of deprecating the server usage audit log API altogether, which we might be able to do if no one is using it anymore. So, I was thinking: * If Telemetry is no longer using the server usage audit log API, we deprecate it in Nova and notify deployment tools to stop setting [DEFAULT]/instance_usage_audit = true to prevent further creation of nova.task_log records and recommend manual cleanup by users or * If Telemetry is still using the server usage audit log API, we create a new 'nova-manage db purge_task_log --before <date>' (or similar) command that will hard delete nova.task_log records before a specified date or all if --before is not specified Can anyone shed any light on whether Telemetry, or any other service, still uses the server usage audit log API in Nova? Would we be able to deprecate it? If we can't, what do you think of the nova-manage command idea? I would appreciate hearing your thoughts about it. Cheers, -melanie [1] https://docs.openstack.org/api-ref/compute/#server-usage-audit-log-os-instan... [2] https://bugzilla.redhat.com/show_bug.cgi?id=1726256 [3] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [4] https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.ins...
On 9/6/2019 6:59 PM, melanie witt wrote:
* If Telemetry is no longer using the server usage audit log API, we deprecate it in Nova and notify deployment tools to stop setting [DEFAULT]/instance_usage_audit = true to prevent further creation of nova.task_log records and recommend manual cleanup by users
Deprecating the API would just be a signal to not develop new tools based on it since it's effectively unmaintained but that doesn't mean we can remove it since there could be non-Telemtry tools in the wild using it that we'd never hear about. You might not be suggesting an eventual path to removal of the API, I'm just bringing that part up since I'm sure people are thinking it. I'm also assuming that API isn't multi-cell aware, meaning it won't traverse cells pulling records like listing servers or migration resources. As for the config option to run the periodic task that creates these records, that's disabled by default so deployment tools shouldn't be enabling it by default - but maybe some do if they are configured to deploy ceilometer.
or
* If Telemetry is still using the server usage audit log API, we create a new 'nova-manage db purge_task_log --before <date>' (or similar) command that will hard delete nova.task_log records before a specified date or all if --before is not specified
If you can't remove the API then this is probably something that needs to happen regardless, though we likely won't know if anyone uses it. I'd consider it pretty low priority given how extremely latent this is and would expect anyone that's been running with this enabled in production has developed DB purge scripts for this table long ago. -- Thanks, Matt
On 9/7/19 3:09 PM, Matt Riedemann wrote:
On 9/6/2019 6:59 PM, melanie witt wrote:
* If Telemetry is no longer using the server usage audit log API, we deprecate it in Nova and notify deployment tools to stop setting [DEFAULT]/instance_usage_audit = true to prevent further creation of nova.task_log records and recommend manual cleanup by users
Deprecating the API would just be a signal to not develop new tools based on it since it's effectively unmaintained but that doesn't mean we can remove it since there could be non-Telemtry tools in the wild using it that we'd never hear about. You might not be suggesting an eventual path to removal of the API, I'm just bringing that part up since I'm sure people are thinking it.
Tools like cASO (https://github.com/IFCA/caso) use this API. This is used by many of the EGI Federated Cloud sites to do accounting per VM (https://egi-federated-cloud-integration.readthedocs.io/en/latest/openstack.h...)
I'm also assuming that API isn't multi-cell aware, meaning it won't traverse cells pulling records like listing servers or migration resources. Given scaling issues with the current Telemetry implementation, I suspect alternative approaches have had to be developed in any case. CERN uses libvirt data extraction.
As for the config option to run the periodic task that creates these records, that's disabled by default so deployment tools shouldn't be enabling it by default - but maybe some do if they are configured to deploy ceilometer.
or
* If Telemetry is still using the server usage audit log API, we create a new 'nova-manage db purge_task_log --before <date>' (or similar) command that will hard delete nova.task_log records before a specified date or all if --before is not specified
If you can't remove the API then this is probably something that needs to happen regardless, though we likely won't know if anyone uses it. I'd consider it pretty low priority given how extremely latent this is and would expect anyone that's been running with this enabled in production has developed DB purge scripts for this table long ago.
I don't think ceilometer uses the compute.instance.exists event by default somewhere or atleast I cannot find a reference to it. What I do know however is that we have a billing system that polls the os-simple-tenant-usage API so if that is unaffected by the possible deprecation of instance_usage_audit then I don't think we use it. Best regards Tobias On 9/7/19 5:20 PM, Tim Bell wrote:
On 9/7/19 3:09 PM, Matt Riedemann wrote:
* If Telemetry is no longer using the server usage audit log API, we deprecate it in Nova and notify deployment tools to stop setting [DEFAULT]/instance_usage_audit = true to prevent further creation of nova.task_log records and recommend manual cleanup by users Deprecating the API would just be a signal to not develop new tools
On 9/6/2019 6:59 PM, melanie witt wrote: based on it since it's effectively unmaintained but that doesn't mean we can remove it since there could be non-Telemtry tools in the wild using it that we'd never hear about. You might not be suggesting an eventual path to removal of the API, I'm just bringing that part up since I'm sure people are thinking it.
Tools like cASO (https://github.com/IFCA/caso) use this API. This is used by many of the EGI Federated Cloud sites to do accounting per VM (https://egi-federated-cloud-integration.readthedocs.io/en/latest/openstack.h...)
I'm also assuming that API isn't multi-cell aware, meaning it won't traverse cells pulling records like listing servers or migration resources. Given scaling issues with the current Telemetry implementation, I suspect alternative approaches have had to be developed in any case. CERN uses libvirt data extraction. As for the config option to run the periodic task that creates these records, that's disabled by default so deployment tools shouldn't be enabling it by default - but maybe some do if they are configured to deploy ceilometer.
or
* If Telemetry is still using the server usage audit log API, we create a new 'nova-manage db purge_task_log --before <date>' (or similar) command that will hard delete nova.task_log records before a specified date or all if --before is not specified If you can't remove the API then this is probably something that needs to happen regardless, though we likely won't know if anyone uses it. I'd consider it pretty low priority given how extremely latent this is and would expect anyone that's been running with this enabled in production has developed DB purge scripts for this table long ago.
On 9/9/2019 5:06 AM, Tobias Urdin wrote:
What I do know however is that we have a billing system that polls the os-simple-tenant-usage API so if that is unaffected by the possible deprecation of instance_usage_audit then I don't think we use it.
Different APIs [1][2] so it's not a problem. [1] https://docs.openstack.org/api-ref/compute/#usage-reports-os-simple-tenant-u... [2] https://docs.openstack.org/api-ref/compute/#server-usage-audit-log-os-instan... -- Thanks, Matt
On 9/7/19 6:09 AM, Matt Riedemann wrote:
On 9/6/2019 6:59 PM, melanie witt wrote:
* If Telemetry is no longer using the server usage audit log API, we deprecate it in Nova and notify deployment tools to stop setting [DEFAULT]/instance_usage_audit = true to prevent further creation of nova.task_log records and recommend manual cleanup by users
Deprecating the API would just be a signal to not develop new tools based on it since it's effectively unmaintained but that doesn't mean we can remove it since there could be non-Telemtry tools in the wild using it that we'd never hear about. You might not be suggesting an eventual path to removal of the API, I'm just bringing that part up since I'm sure people are thinking it.
I'm also assuming that API isn't multi-cell aware, meaning it won't traverse cells pulling records like listing servers or migration resources.
As for the config option to run the periodic task that creates these records, that's disabled by default so deployment tools shouldn't be enabling it by default - but maybe some do if they are configured to deploy ceilometer.
Indeed, tripleo enables the periodic task when deploying Telemetry, which is how we have customers hitting the unbounded nova.task_log table growth problem.
or
* If Telemetry is still using the server usage audit log API, we create a new 'nova-manage db purge_task_log --before <date>' (or similar) command that will hard delete nova.task_log records before a specified date or all if --before is not specified
If you can't remove the API then this is probably something that needs to happen regardless, though we likely won't know if anyone uses it. I'd consider it pretty low priority given how extremely latent this is and would expect anyone that's been running with this enabled in production has developed DB purge scripts for this table long ago.
Yeah, based on Tim Bell's reply later in this thread, we can't remove the API (tools in the wild using it). So, I'll propose a new nova-manage command because we don't appear to have a standard way of cleaning up nova.task_log records for customers either, yet. -melanie
participants (4)
-
Matt Riedemann
-
melanie witt
-
Tim Bell
-
Tobias Urdin