[Openstack] get_diagnostics runs on shutdown instances, and raises exception.

Jay Pipes jaypipes at gmail.com
Fri Jul 7 17:58:14 UTC 2017


On 07/07/2017 01:37 PM, Peter Doherty wrote:
> Thanks.  I wrongfully assumed it was being run automatically, so with 
> that out of my mind, it didn't take too long to figure out what was 
> triggering that.  I'm running the Datadog agent, which is the source.  
> It generated enough noise in a week I ended up with a million rows in 
> the nova.instance_fault table, and the memory footprint of nova-api got 
> very large, all of which resulted in multi-minute responses to instance 
> list queries.

Heh, yes, that performance issue when doing a list instances with a 
large instance_faults table has come up before. We fixed that in Ocata, 
though:

https://bugs.launchpad.net/nova/+bug/1632247

> I can open a bug report about the log messages.  I think it may be in 
> the nova/compute/manager.py code, which doesn't seem to gracefully know 
> what to do if get_diagnostics is called on a instance that isn't 
> running, and results in a lot of useless rows in the instance_fault table.
> 
>    @wrap_instance_fault
>      def get_diagnostics(self, context, instance):
>          """Retrieve diagnostics for an instance on this host."""
>          current_power_state = self._get_power_state(context, instance)
>          if current_power_state == power_state.RUNNING:
>              LOG.info(_LI("Retrieving diagnostics"), context=context,
>                        instance=instance)
>              return self.driver.get_diagnostics(instance)
>          else:
>              raise exception.InstanceInvalidState(
>                  attr='power_state',
>                  instance_uuid=instance.uuid,
>                  state=instance.power_state,
>                  method='get_diagnostics')

Yep, that's the code that emits the exception. We should be just 
returning an error to the user instead of raising an exception. And, we 
should not be adding a record to the instance_faults table (which is 
what that @wrap_instance_fault decorator does when it sees an exception 
raised like that).

If you could create a bug on LP for that, I'd very much appreciate it.

All the best,
-jay

> Thanks Jay!
> 
> -Peter
> 
> On Fri, Jul 7, 2017 at 12:50 PM, Jay Pipes <jaypipes at gmail.com 
> <mailto:jaypipes at gmail.com>> wrote:
> 
>     On 07/07/2017 12:30 PM, Peter Doherty wrote:
> 
>         Hi,
> 
>         If I'm interpreting this correctly, nova compute is calling
>         get_diagnostics on all instances, including ones currently in a
>         shutdown state.  And then it throws an exception, and adds an
>         entry into the instance_faults table in the database.
> 
>         nova-compute logs this message:
> 
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher Traceback (most recent call last):
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>         line 142, in _dispatch_and_reply
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     executor_callback))
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>         line 186, in _dispatch
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     executor_callback)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
>         line 129, in _do_dispatch
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/nova/exception.py", line 89,
>         in wrapped
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     payload)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
>         195, in __exit__
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     six.reraise(self.type_,
>         self.value, self.tb)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/nova/exception.py", line 72,
>         in wrapped
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     return f(self, context, *args,
>         **kw)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
>         378, in decorated_function
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     kwargs['instance'], e,
>         sys.exc_info())
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
>         195, in __exit__
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     six.reraise(self.type_,
>         self.value, self.tb)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
>         366, in decorated_function
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     return function(self, context,
>         *args, **kwargs)
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher   File
>         "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line
>         4089, in get_diagnostics
>         2017-07-07 16:29:46.184 23077 ERROR
>         oslo_messaging.rpc.dispatcher     method='get_diagnostics')
> 
>         2017-07-07 16:30:10.017 23077 ERROR
>         oslo_messaging.rpc.dispatcher InstanceInvalidState: Instance
>         6ab60005-ccbf-4bc2-95ac-7daf31716754 in power_state 4. Cannot
>         get_diagnostics while the instance is in this state.
> 
>         I don't think it should be trying to gather diags on shutdown
>         instances, and if it did, it shouldn't just create a
>         never-ending stream of errors.
>         If anyone has any info on if this might be a bug that is fixed
>         in the latest release, or if I can turn off this behavior, it
>         would be appreciated.
> 
> 
>     get_diagnostics() doesn't run automatically. Something is triggering
>     a call to get_diagnostics() for each instance on the box (the
>     internal compute manager only has a get_diagnostics(instance) call
>     that takes one instance at a time). Not sure what is triggering that...
> 
>     I agree with you that ERRORs shouldn't be spewed into the
>     nova-compute logs like the above, though. That should be fixed.
>     Would you mind submitting a bug for that on Launchpad, Peter?
> 
>     Thank you!
>     -jay
> 
>     _______________________________________________
>     Mailing list:
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>     Post to     : openstack at lists.openstack.org
>     <mailto:openstack at lists.openstack.org>
>     Unsubscribe :
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
> 
> 
> 
> 
> -- 
> Peter Doherty
> Systems Engineer, Systems Engineering
> Brightcove Inc.
> 290 Congress St., 4th Floor, Boston, MA 02210
> 
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> 



More information about the Openstack mailing list