[openstack-dev] [ceilometer] overuse of 'except Exception'

Doug Hellmann doug at doughellmann.com
Thu Jul 24 22:41:19 UTC 2014


On Jul 24, 2014, at 9:35 AM, Chris Dent <chdent at redhat.com> wrote:

> On Thu, 24 Jul 2014, Doug Hellmann wrote:
> 
>> I don’t claim any special status except that I was there and am
>> trying to provide background on why things are as they are. :-)
> 
> I think that counts and I very much appreciate the responses.
> 
>> Having a hard-fail error handler is useful in situations where
>> continuing operation would make the problem worse *and* the deployer
>> can fix the problem. Being unable to connect to a database might be an
>> example of this. However, we don’t want the ceilometer collector to
>> shutdown if it receives a notification it doesn’t understand because
>> discarding a malformed message isn’t going to make other ceilometer
>> operations any worse, and seeing such a message isn’t a situation
>> the deployer can fix. Catching KeyError, IndexError, AttributeError,
>> etc. for those cases would be useful if we were going to treat those
>> exception types differently somehow, but we don’t.
> 
> I guess what I'm asking is "shouldn't we treat them differently?” If

Possibly? :-)

> I've got a dict coming in over the bus and it is missing a key, big
> deal, the bus is still working. I _do_ want to know about it but it
> isn't a disaster so I can (and should) catch KeyError and log a
> short message (without traceback) that is specially encoded to say
> "how about that the notification payload was messed up".
> 
> Maybe such a thing is elsewhere in the stack, if so, great. In that
> case the thing code I pointed out as a bit of a compromise is in
> place, just not in the same place.

I believe any failures of the notifier/rpc stack are logged in that code, separately from the callback being invoked when a message is actually received. I know that the messaging drivers have retry logic in them to reconnect if the bus goes down, etc., and that part of the machinery is all separate from the ceilometer application logic.

> 
> What I don't want is a thing logged as _exceptional_ when it isn't.
> 
>> That said, it’s possible we could tighten up some of the error
>> handling outside of event processing loops, so as I said, if you have
>> a more specific proposal for places you think we can predict the
>> exception type reliably, we should talk about those directly instead
>> of the general case. You mention distinguishing between “the noise
>> and the nasty” — do you have some logs you can share?
> 
> Only vaguely at this point, based on my experiences in the past few days
> trying to chase down failures in the gate. There's just so much logged,
> a lot of which doesn't help, but at the same time a fair bit which looks
> like it ought to be a traceback and handled more aggressively. That
> experience drove me into the Ceilometer code in an effort to check the
> hygiene there and see if there was something I could do in that small
> environment (rather than the overwhelming context of the The Entire
> Project™).
> 
> I'll pay a bit closer attention to the specific relationship
> between the ceilometer exceptions (on the loops) and the logs and
> when I find something that particularly annoys me, I'll submit a
> patch for review and we'll see how it goes.

OK, that sounds like a good way to approach it. I’ll try to keep an eye out for those sorts of patches, though I’m mostly focusing on Oslo this cycle so some of the other ceilometer cores might get to them before I do.

Doug

> 
> -- 
> Chris Dent tw:@anticdent freenode:cdent
> https://tank.peermore.com/tanks/cdent_______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list