[openstack-dev] [oslo] logging around olso lockutils

Jay Pipes jaypipes at gmail.com
Thu Sep 25 17:57:06 UTC 2014


+1 for making those two changes. I also have been frustrated doing 
debugging in the gate recently, and any operational-ease-of-debugging 
things like this would be appreciated.

-jay

On 09/25/2014 08:49 AM, Sean Dague wrote:
> Spending a ton of time reading logs, oslo locking ends up basically
> creating a ton of output at DEBUG that you have to mentally filter to
> find problems:
>
> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Created new semaphore "iptables" internal_lock
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:206
> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Acquired semaphore "iptables" lock
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:229
> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Attempting to grab external lock "iptables" external_lock
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:178
> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Got file lock "/opt/stack/data/nova/nova-iptables" acquire
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:93
> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Got semaphore / lock "_do_refresh_provider_fw_rules" inner
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:271
> 2014-09-24 18:44:49.244 DEBUG nova.compute.manager
> [req-b91cb1c1-f211-43ef-9714-651eeb3b2302
> DeleteServersAdminTestXML-1408641898
> DeleteServersAdminTestXML-469708524] [instance:
> 98eb8e6e-088b-4dda-ada5-7b2b79f62506] terminating bdm
> BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2014-09-24T18:44:42Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/vda',device_type='disk',disk_bus=None,guest_format=None,id=43,image_id='262ab8a2-0790-49b3-a8d3-e8ed73e3ed71',instance=<?>,instance_uuid=98eb8e6e-088b-4dda-ada5-7b2b79f62506,no_device=False,snapshot_id=None,source_type='image',updated_at=2014-09-24T18:44:42Z,volume_id=None,volume_size=None)
> _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2407
> 2014-09-24 18:44:49.248 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Released file lock "/opt/stack/data/nova/nova-iptables" release
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:115
> 2014-09-24 18:44:49.248 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Releasing semaphore "iptables" lock
> /opt/stack/new/nova/nova/openstack/common/lockutils.py:238
> 2014-09-24 18:44:49.249 DEBUG nova.openstack.common.lockutils
> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
> Semaphore / lock released "_do_refresh_provider_fw_rules" inner
>
> Also readable here:
> http://logs.openstack.org/01/123801/1/check/check-tempest-dsvm-full/b5f8b37/logs/screen-n-cpu.txt.gz#_2014-09-24_18_44_49_240
>
> (Yes, it's kind of ugly)
>
> What occured to me is that in debugging locking issues what we actually
> care about is 2 things semantically:
>
> #1 - tried to get a lock, but someone else has it. Then we know we've
> got lock contention. .
> #2 - something is still holding a lock after some "long" amount of time.
>
> #2 turned out to be a critical bit in understanding one of the worst
> recent gate impacting issues.
>
> You can write a tool today that analyzes the logs and shows you these
> things. However, I wonder if we could actually do something creative in
> the code itself to do this already. I'm curious if the creative use of
> Timers might let us emit log messages under the conditions above
> (someone with better understanding of python internals needs to speak up
> here). Maybe it's too much overhead, but I think it's worth at least
> asking the question.
>
> The same issue exists when it comes to processutils I think, warning
> that a command is still running after 10s might be really handy, because
> it turns out that issue #2 was caused by this, and it took quite a bit
> of decoding to figure that out.
>
> 	-Sean
>



More information about the OpenStack-dev mailing list