[Openstack] get_diagnostics runs on shutdown instances, and raises exception.
Jay Pipes
jaypipes at gmail.com
Fri Jul 7 17:58:14 UTC 2017
On 07/07/2017 01:37 PM, Peter Doherty wrote:
> Thanks. I wrongfully assumed it was being run automatically, so with
> that out of my mind, it didn't take too long to figure out what was
> triggering that. I'm running the Datadog agent, which is the source.
> It generated enough noise in a week I ended up with a million rows in
> the nova.instance_fault table, and the memory footprint of nova-api got
> very large, all of which resulted in multi-minute responses to instance
> list queries.
Heh, yes, that performance issue when doing a list instances with a
large instance_faults table has come up before. We fixed that in Ocata,
though:
https://bugs.launchpad.net/nova/+bug/1632247
> I can open a bug report about the log messages. I think it may be in
> the nova/compute/manager.py code, which doesn't seem to gracefully know
> what to do if get_diagnostics is called on a instance that isn't
> running, and results in a lot of useless rows in the instance_fault table.
>
> @wrap_instance_fault
> def get_diagnostics(self, context, instance):
> """Retrieve diagnostics for an instance on this host."""
> current_power_state = self._get_power_state(context, instance)
> if current_power_state == power_state.RUNNING:
> LOG.info(_LI("Retrieving diagnostics"), context=context,
> instance=instance)
> return self.driver.get_diagnostics(instance)
> else:
> raise exception.InstanceInvalidState(
> attr='power_state',
> instance_uuid=instance.uuid,
> state=instance.power_state,
> method='get_diagnostics')
Yep, that's the code that emits the exception. We should be just
returning an error to the user instead of raising an exception. And, we
should not be adding a record to the instance_faults table (which is
what that @wrap_instance_fault decorator does when it sees an exception
raised like that).
If you could create a bug on LP for that, I'd very much appreciate it.
All the best,
-jay
> Thanks Jay!
>
> -Peter
>
> On Fri, Jul 7, 2017 at 12:50 PM, Jay Pipes <jaypipes at gmail.com
> <mailto:jaypipes at gmail.com>> wrote:
>
> On 07/07/2017 12:30 PM, Peter Doherty wrote:
>
> Hi,
>
> If I'm interpreting this correctly, nova compute is calling
> get_diagnostics on all instances, including ones currently in a
> shutdown state. And then it throws an exception, and adds an
> entry into the instance_faults table in the database.
>
> nova-compute logs this message:
>
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher Traceback (most recent call last):
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
> line 142, in _dispatch_and_reply
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher executor_callback))
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
> line 186, in _dispatch
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher executor_callback)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
> line 129, in _do_dispatch
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/nova/exception.py", line 89,
> in wrapped
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher payload)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
> 195, in __exit__
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher six.reraise(self.type_,
> self.value, self.tb)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/nova/exception.py", line 72,
> in wrapped
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher return f(self, context, *args,
> **kw)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
> 378, in decorated_function
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher kwargs['instance'], e,
> sys.exc_info())
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
> 195, in __exit__
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher six.reraise(self.type_,
> self.value, self.tb)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
> 366, in decorated_function
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher return function(self, context,
> *args, **kwargs)
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher File
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
> 4089, in get_diagnostics
> 2017-07-07 16:29:46.184 23077 ERROR
> oslo_messaging.rpc.dispatcher method='get_diagnostics')
>
> 2017-07-07 16:30:10.017 23077 ERROR
> oslo_messaging.rpc.dispatcher InstanceInvalidState: Instance
> 6ab60005-ccbf-4bc2-95ac-7daf31716754 in power_state 4. Cannot
> get_diagnostics while the instance is in this state.
>
> I don't think it should be trying to gather diags on shutdown
> instances, and if it did, it shouldn't just create a
> never-ending stream of errors.
> If anyone has any info on if this might be a bug that is fixed
> in the latest release, or if I can turn off this behavior, it
> would be appreciated.
>
>
> get_diagnostics() doesn't run automatically. Something is triggering
> a call to get_diagnostics() for each instance on the box (the
> internal compute manager only has a get_diagnostics(instance) call
> that takes one instance at a time). Not sure what is triggering that...
>
> I agree with you that ERRORs shouldn't be spewed into the
> nova-compute logs like the above, though. That should be fixed.
> Would you mind submitting a bug for that on Launchpad, Peter?
>
> Thank you!
> -jay
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
> Post to : openstack at lists.openstack.org
> <mailto:openstack at lists.openstack.org>
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>
>
>
>
> --
> Peter Doherty
> Systems Engineer, Systems Engineering
> Brightcove Inc.
> 290 Congress St., 4th Floor, Boston, MA 02210
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
More information about the Openstack
mailing list