[openstack-dev] [oslo] logging around olso lockutils

Davanum Srinivas davanum at gmail.com
Thu Sep 25 18:24:19 UTC 2014


Logged as high priority bug -
https://bugs.launchpad.net/oslo.concurrency/+bug/1374075

On Thu, Sep 25, 2014 at 1:57 PM, Jay Pipes <jaypipes at gmail.com> wrote:
> +1 for making those two changes. I also have been frustrated doing debugging
> in the gate recently, and any operational-ease-of-debugging things like this
> would be appreciated.
>
> -jay
>
> On 09/25/2014 08:49 AM, Sean Dague wrote:
>>
>> Spending a ton of time reading logs, oslo locking ends up basically
>> creating a ton of output at DEBUG that you have to mentally filter to
>> find problems:
>>
>> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Created new semaphore "iptables" internal_lock
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:206
>> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Acquired semaphore "iptables" lock
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:229
>> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Attempting to grab external lock "iptables" external_lock
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:178
>> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Got file lock "/opt/stack/data/nova/nova-iptables" acquire
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:93
>> 2014-09-24 18:44:49.240 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Got semaphore / lock "_do_refresh_provider_fw_rules" inner
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:271
>> 2014-09-24 18:44:49.244 DEBUG nova.compute.manager
>> [req-b91cb1c1-f211-43ef-9714-651eeb3b2302
>> DeleteServersAdminTestXML-1408641898
>> DeleteServersAdminTestXML-469708524] [instance:
>> 98eb8e6e-088b-4dda-ada5-7b2b79f62506] terminating bdm
>>
>> BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2014-09-24T18:44:42Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/vda',device_type='disk',disk_bus=None,guest_format=None,id=43,image_id='262ab8a2-0790-49b3-a8d3-e8ed73e3ed71',instance=<?>,instance_uuid=98eb8e6e-088b-4dda-ada5-7b2b79f62506,no_device=False,snapshot_id=None,source_type='image',updated_at=2014-09-24T18:44:42Z,volume_id=None,volume_size=None)
>> _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2407
>> 2014-09-24 18:44:49.248 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Released file lock "/opt/stack/data/nova/nova-iptables" release
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:115
>> 2014-09-24 18:44:49.248 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Releasing semaphore "iptables" lock
>> /opt/stack/new/nova/nova/openstack/common/lockutils.py:238
>> 2014-09-24 18:44:49.249 DEBUG nova.openstack.common.lockutils
>> [req-d7443a9c-1eb5-42c3-bf40-aadd20f0452f
>> ListImageFiltersTestXML-1316776531 ListImageFiltersTestXML-132181290]
>> Semaphore / lock released "_do_refresh_provider_fw_rules" inner
>>
>> Also readable here:
>>
>> http://logs.openstack.org/01/123801/1/check/check-tempest-dsvm-full/b5f8b37/logs/screen-n-cpu.txt.gz#_2014-09-24_18_44_49_240
>>
>> (Yes, it's kind of ugly)
>>
>> What occured to me is that in debugging locking issues what we actually
>> care about is 2 things semantically:
>>
>> #1 - tried to get a lock, but someone else has it. Then we know we've
>> got lock contention. .
>> #2 - something is still holding a lock after some "long" amount of time.
>>
>> #2 turned out to be a critical bit in understanding one of the worst
>> recent gate impacting issues.
>>
>> You can write a tool today that analyzes the logs and shows you these
>> things. However, I wonder if we could actually do something creative in
>> the code itself to do this already. I'm curious if the creative use of
>> Timers might let us emit log messages under the conditions above
>> (someone with better understanding of python internals needs to speak up
>> here). Maybe it's too much overhead, but I think it's worth at least
>> asking the question.
>>
>> The same issue exists when it comes to processutils I think, warning
>> that a command is still running after 10s might be really handy, because
>> it turns out that issue #2 was caused by this, and it took quite a bit
>> of decoding to figure that out.
>>
>>         -Sean
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: https://twitter.com/dims



More information about the OpenStack-dev mailing list