[openstack-dev] [oslo][all] The lock files saga (and where we can go from here)

Joshua Harlow harlowja at fastmail.com
Mon Nov 30 20:28:47 UTC 2015


Clint Byrum wrote:
> Excerpts from Joshua Harlow's message of 2015-11-30 10:42:53 -0800:
>> Hi all,
>>
>> I just wanted to bring up an issue, possible solution and get feedback
>> on it from folks because it seems to be an on-going problem that shows
>> up not when an application is initially deployed but as on-going
>> operation and running of that application proceeds (ie after running for
>> a period of time).
>>
>> The jist of the problem is the following:
>>
>> A<<pick your favorite openstack project>>  has a need to ensure that no
>> application on the same machine can manipulate a given resource on that
>> same machine, so it uses the lock file pattern (acquire a *local* lock
>> file for that resource, manipulate that resource, release that lock
>> file) to do actions on that resource in a safe manner (note this does
>> not ensure safety outside of that machine, lock files are *not*
>> distributed locks).
>>
>> The api that we expose from oslo is typically accessed via the following:
>>
>>     oslo_concurrency.lockutils.synchronized(name, lock_file_prefix=None,
>> external=False, lock_path=None, semaphores=None, delay=0.01)
>>
>> or via its underlying library (that I extracted from oslo.concurrency
>> and have improved to add more usefulness) @
>> http://fasteners.readthedocs.org/
>>
>> The issue though for<<your favorite openstack project>>  is that each of
>> these projects now typically has a large amount of lock files that exist
>> or have existed and no easy way to determine when those lock files can
>> be deleted (afaik no? periodic task exists in said projects to clean up
>> lock files, or to delete them when they are no longer in use...) so what
>> happens is bugs like https://bugs.launchpad.net/cinder/+bug/1432387
>> appear and there is no a simple solution to clean lock files up (since
>> oslo.concurrency is really not the right layer to know when a lock can
>> or can not be deleted, only the application knows that...)
>>
>> So then we get a few creative solutions like the following:
>>
>> - https://review.openstack.org/#/c/241663/
>> - https://review.openstack.org/#/c/239678/
>> - (and others?)
>>
>> So I wanted to ask the question, how are people involved in<<your
>> favorite openstack project>>  cleaning up these files (are they at all?)
>>
>> Another idea that I have been proposing also is to use offset locks.
>>
>> This would allow for not creating X lock files, but create a *single*
>> lock file per project and use offsets into it as the way to lock. For
>> example nova could/would create a 1MB (or larger/smaller) *empty* file
>> for locks, that would allow for 1,048,576 locks to be used at the same
>> time, which honestly should be way more than enough, and then there
>> would not need to be any lock cleanup at all... Is there any reason this
>> wasn't initially done back way when this lock file code was created?
>> (https://github.com/harlowja/fasteners/pull/10 adds this functionality
>> to the underlying library if people want to look it over)
>
> This is really complicated, and basically just makes the directory of
> lock files _look_ clean. But it still leaves each offset stale, and has
> to be cleaned anyway.

What do u mean here (out of curiosity), each offset stale? The file 
would basically never change size after startup (pick a large enough 
number, 10 million, 1 trillion billion...) and use it appropriately from 
there on out...

>
> Fasteners already has process locks that use fcntl/flock.
>
> These locks provide enough to allow you to infer things about.  the owner
> of the lock file. If there's no process still holding the exclusive lock
> when you try to lock it, then YOU own it, and thus control the resource.

Well not really, python doesn't expose the ability to introspect who has 
the handle afaik, I  tried to look into that and it looks like fnctl 
(the C api) might have a way to get it, but u can't really introspect 
that, without as u stated, acquiring the lock yourself... I can try to 
recall more of this investigation when I was trying to add a @owner_pid 
property onto fasteners interprocess lock class but from my simple 
memory the exposed API isn't there in python.

>
> A cron job which tries to flock anything older than ${REASONABLE_TIME}
> and deletes them seems fine. Whatever process was trying to interact
> with the resource is gone at that point.

Yes, or a periodic thread in the application that can do this in a safe 
manner (using its ability to know exactly what its own apps internals 
are doing...)

>
> Now, anything that needs to safely manage a resource beyond without a
> live process will need to keep track of its own state and be idempotent
> anyway. IMO this isn't something lock files alone solve well. I believe
> you're familiar with a library named taskflow that is supposed to help
> write code that does this better ;). Even without taskflow, if you are
> trying to do something exclusive without a single process that stays
> alive, you need to do _something_ to keep track of state and restart
> or revert that flow. That is a state management problem, not a locking
> problem.
>

Agreed. ;)

> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list