[openstack-dev] [ceilometer] overuse of 'except Exception'

Chris Dent chdent at redhat.com
Thu Jul 24 13:35:27 UTC 2014


On Thu, 24 Jul 2014, Doug Hellmann wrote:

> I don’t claim any special status except that I was there and am
> trying to provide background on why things are as they are. :-)

I think that counts and I very much appreciate the responses.

> Having a hard-fail error handler is useful in situations where
> continuing operation would make the problem worse *and* the deployer
> can fix the problem. Being unable to connect to a database might be an
> example of this. However, we don’t want the ceilometer collector to
> shutdown if it receives a notification it doesn’t understand because
> discarding a malformed message isn’t going to make other ceilometer
> operations any worse, and seeing such a message isn’t a situation
> the deployer can fix. Catching KeyError, IndexError, AttributeError,
> etc. for those cases would be useful if we were going to treat those
> exception types differently somehow, but we don’t.

I guess what I'm asking is "shouldn't we treat them differently?" If
I've got a dict coming in over the bus and it is missing a key, big
deal, the bus is still working. I _do_ want to know about it but it
isn't a disaster so I can (and should) catch KeyError and log a
short message (without traceback) that is specially encoded to say
"how about that the notification payload was messed up".

Maybe such a thing is elsewhere in the stack, if so, great. In that
case the thing code I pointed out as a bit of a compromise is in
place, just not in the same place.

What I don't want is a thing logged as _exceptional_ when it isn't.

> That said, it’s possible we could tighten up some of the error
> handling outside of event processing loops, so as I said, if you have
> a more specific proposal for places you think we can predict the
> exception type reliably, we should talk about those directly instead
> of the general case. You mention distinguishing between “the noise
> and the nasty” — do you have some logs you can share?

Only vaguely at this point, based on my experiences in the past few days
trying to chase down failures in the gate. There's just so much logged,
a lot of which doesn't help, but at the same time a fair bit which looks
like it ought to be a traceback and handled more aggressively. That
experience drove me into the Ceilometer code in an effort to check the
hygiene there and see if there was something I could do in that small
environment (rather than the overwhelming context of the The Entire
Project™).

I'll pay a bit closer attention to the specific relationship
between the ceilometer exceptions (on the loops) and the logs and
when I find something that particularly annoys me, I'll submit a
patch for review and we'll see how it goes.

-- 
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent


More information about the OpenStack-dev mailing list