[Openstack] get_diagnostics runs on shutdown instances, and raises exception.

Peter Doherty pdoherty at brightcove.com
Fri Jul 7 17:37:19 UTC 2017


Thanks.  I wrongfully assumed it was being run automatically, so with that
out of my mind, it didn't take too long to figure out what was triggering
that.  I'm running the Datadog agent, which is the source.  It generated
enough noise in a week I ended up with a million rows in the
nova.instance_fault table, and the memory footprint of nova-api got very
large, all of which resulted in multi-minute responses to instance list
queries.

I can open a bug report about the log messages.  I think it may be in the
nova/compute/manager.py code, which doesn't seem to gracefully know what to
do if get_diagnostics is called on a instance that isn't running, and
results in a lot of useless rows in the instance_fault table.

  @wrap_instance_fault
    def get_diagnostics(self, context, instance):
        """Retrieve diagnostics for an instance on this host."""
        current_power_state = self._get_power_state(context, instance)
        if current_power_state == power_state.RUNNING:
            LOG.info(_LI("Retrieving diagnostics"), context=context,
                      instance=instance)
            return self.driver.get_diagnostics(instance)
        else:
            raise exception.InstanceInvalidState(
                attr='power_state',
                instance_uuid=instance.uuid,
                state=instance.power_state,
                method='get_diagnostics')


Thanks Jay!

-Peter

On Fri, Jul 7, 2017 at 12:50 PM, Jay Pipes <jaypipes at gmail.com> wrote:

> On 07/07/2017 12:30 PM, Peter Doherty wrote:
>
>> Hi,
>>
>> If I'm interpreting this correctly, nova compute is calling
>> get_diagnostics on all instances, including ones currently in a shutdown
>> state.  And then it throws an exception, and adds an entry into the
>> instance_faults table in the database.
>>
>> nova-compute logs this message:
>>
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>> Traceback (most recent call last):
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>> line 142, in _dispatch_and_reply
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  executor_callback))
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>> line 186, in _dispatch
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  executor_callback)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>> line 129, in _do_dispatch
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  result = func(ctxt, **new_args)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/nova/exception.py", line 89, in wrapped
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  payload)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in
>> __exit__
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  six.reraise(self.type_, self.value, self.tb)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/nova/exception.py", line 72, in wrapped
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  return f(self, context, *args, **kw)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 378, in
>> decorated_function
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  kwargs['instance'], e, sys.exc_info())
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in
>> __exit__
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  six.reraise(self.type_, self.value, self.tb)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 366, in
>> decorated_function
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  return function(self, context, *args, **kwargs)
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher   File
>> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4089,
>> in get_diagnostics
>> 2017-07-07 16:29:46.184 23077 ERROR oslo_messaging.rpc.dispatcher
>>  method='get_diagnostics')
>>
>> 2017-07-07 16:30:10.017 23077 ERROR oslo_messaging.rpc.dispatcher
>> InstanceInvalidState: Instance 6ab60005-ccbf-4bc2-95ac-7daf31716754 in
>> power_state 4. Cannot get_diagnostics while the instance is in this state.
>>
>> I don't think it should be trying to gather diags on shutdown instances,
>> and if it did, it shouldn't just create a never-ending stream of errors.
>> If anyone has any info on if this might be a bug that is fixed in the
>> latest release, or if I can turn off this behavior, it would be appreciated.
>>
>
> get_diagnostics() doesn't run automatically. Something is triggering a
> call to get_diagnostics() for each instance on the box (the internal
> compute manager only has a get_diagnostics(instance) call that takes one
> instance at a time). Not sure what is triggering that...
>
> I agree with you that ERRORs shouldn't be spewed into the nova-compute
> logs like the above, though. That should be fixed. Would you mind
> submitting a bug for that on Launchpad, Peter?
>
> Thank you!
> -jay
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
> k
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
> k
>



-- 
Peter Doherty
Systems Engineer, Systems Engineering
Brightcove Inc.
290 Congress St., 4th Floor, Boston, MA 02210
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170707/892b22d4/attachment.html>


More information about the Openstack mailing list