[openstack-dev] [oslo][all] The lock files saga (and where we can go from here)
Robert Collins
robertc at robertcollins.net
Mon Nov 30 20:01:11 UTC 2015
On 1 December 2015 at 08:37, Ben Nemec <openstack at nemebean.com> wrote:
> On 11/30/2015 12:42 PM, Joshua Harlow wrote:
>> Hi all,
>>
>> I just wanted to bring up an issue, possible solution and get feedback
>> on it from folks because it seems to be an on-going problem that shows
>> up not when an application is initially deployed but as on-going
>> operation and running of that application proceeds (ie after running for
>> a period of time).
>>
>> The jist of the problem is the following:
>>
>> A <<pick your favorite openstack project>> has a need to ensure that no
>> application on the same machine can manipulate a given resource on that
>> same machine, so it uses the lock file pattern (acquire a *local* lock
>> file for that resource, manipulate that resource, release that lock
>> file) to do actions on that resource in a safe manner (note this does
>> not ensure safety outside of that machine, lock files are *not*
>> distributed locks).
>>
>> The api that we expose from oslo is typically accessed via the following:
>>
>> oslo_concurrency.lockutils.synchronized(name, lock_file_prefix=None,
>> external=False, lock_path=None, semaphores=None, delay=0.01)
>>
>> or via its underlying library (that I extracted from oslo.concurrency
>> and have improved to add more usefulness) @
>> http://fasteners.readthedocs.org/
>>
>> The issue though for <<your favorite openstack project>> is that each of
>> these projects now typically has a large amount of lock files that exist
>> or have existed and no easy way to determine when those lock files can
>> be deleted (afaik no? periodic task exists in said projects to clean up
>> lock files, or to delete them when they are no longer in use...) so what
>> happens is bugs like https://bugs.launchpad.net/cinder/+bug/1432387
>> appear and there is no a simple solution to clean lock files up (since
>> oslo.concurrency is really not the right layer to know when a lock can
>> or can not be deleted, only the application knows that...)
>>
>> So then we get a few creative solutions like the following:
>>
>> - https://review.openstack.org/#/c/241663/
>> - https://review.openstack.org/#/c/239678/
>> - (and others?)
>>
>> So I wanted to ask the question, how are people involved in <<your
>> favorite openstack project>> cleaning up these files (are they at all?)
>>
>> Another idea that I have been proposing also is to use offset locks.
>>
>> This would allow for not creating X lock files, but create a *single*
>> lock file per project and use offsets into it as the way to lock. For
>> example nova could/would create a 1MB (or larger/smaller) *empty* file
>> for locks, that would allow for 1,048,576 locks to be used at the same
>> time, which honestly should be way more than enough, and then there
>> would not need to be any lock cleanup at all... Is there any reason this
>> wasn't initially done back way when this lock file code was created?
>> (https://github.com/harlowja/fasteners/pull/10 adds this functionality
>> to the underlying library if people want to look it over)
>
> I think the main reason was that even with a million locks available,
> you'd have to find a way to hash the lock names to offsets in the file,
> and a million isn't a very large collision space for that. Having two
> differently named locks that hashed to the same offset would lead to
> incredibly confusing bugs.
>
> We could switch to requiring the projects to provide the offsets instead
> of hashing a string value, but that's just pushing the collision problem
> off onto every project that uses us.
>
> So that's the problem as I understand it, but where does that leave us
> for solutions? First, there's
> https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L151
> which allows consumers to delete lock files when they're done with them.
> Of course, in that case the onus is on the caller to make sure the lock
> couldn't possibly be in use anymore.
>
> Second, is this actually a problem? Modern filesystems have absurdly
> large limits on the number of files in a directory, so it's highly
> unlikely we would ever exhaust that, and we're creating all zero byte
> files so there shouldn't be a significant space impact either. In the
> past I believe our recommendation has been to simply create a cleanup
> job that runs on boot, before any of the OpenStack services start, that
> deletes all of the lock files. At that point you know it's safe to
> delete them, and it prevents your lock file directory from growing forever.
Not that high - ext3 (still the default for nova ephemeral
partitions!) has a limit of 64k in one directory.
That said, I don't disagree - my thinkis is that we should advise
putting such files on a tmpfs.
-Rob
More information about the OpenStack-dev
mailing list