[openstack-dev] [oslo] logging around olso lockutils
Christopher Yeoh
cbkyeoh at gmail.com
Fri Sep 26 04:03:02 UTC 2014
On Thu, 25 Sep 2014 08:49:12 -0400
Sean Dague <sean at dague.net> wrote:
>
> #1 - tried to get a lock, but someone else has it. Then we know we've
> got lock contention. .
> #2 - something is still holding a lock after some "long" amount of
> time.
+1 to both.
> #2 turned out to be a critical bit in understanding one of the worst
> recent gate impacting issues.
>
> You can write a tool today that analyzes the logs and shows you these
> things. However, I wonder if we could actually do something creative
> in the code itself to do this already. I'm curious if the creative
> use of Timers might let us emit log messages under the conditions
> above (someone with better understanding of python internals needs to
> speak up here). Maybe it's too much overhead, but I think it's worth
> at least asking the question.
Even a simple log at the end when its finally released if it has been
held for a long time would be handy. As matching up acquire/release by
eye is not easy.
I don't think we get a log message when an acquire is attempted but
fails. That might help get a measure of lock contention?
> The same issue exists when it comes to processutils I think, warning
> that a command is still running after 10s might be really handy,
> because it turns out that issue #2 was caused by this, and it took
> quite a bit of decoding to figure that out.
Also I think that perhaps a log message when a period task takes longer
than the interval that the task is meant to be run would be a useful
warning sign that something odd is going on.
Chris
More information about the OpenStack-dev
mailing list