[openstack-dev] [oslo][all] The lock files saga (and where we can go from here)

Ben Nemec openstack at nemebean.com
Mon Nov 30 20:46:40 UTC 2015


On 11/30/2015 01:57 PM, Joshua Harlow wrote:
> Ben Nemec wrote:
>> On 11/30/2015 12:42 PM, Joshua Harlow wrote:
>>> Hi all,
>>>
>>> I just wanted to bring up an issue, possible solution and get feedback
>>> on it from folks because it seems to be an on-going problem that shows
>>> up not when an application is initially deployed but as on-going
>>> operation and running of that application proceeds (ie after running for
>>> a period of time).
>>>
>>> The jist of the problem is the following:
>>>
>>> A<<pick your favorite openstack project>>  has a need to ensure that no
>>> application on the same machine can manipulate a given resource on that
>>> same machine, so it uses the lock file pattern (acquire a *local* lock
>>> file for that resource, manipulate that resource, release that lock
>>> file) to do actions on that resource in a safe manner (note this does
>>> not ensure safety outside of that machine, lock files are *not*
>>> distributed locks).
>>>
>>> The api that we expose from oslo is typically accessed via the following:
>>>
>>>     oslo_concurrency.lockutils.synchronized(name, lock_file_prefix=None,
>>> external=False, lock_path=None, semaphores=None, delay=0.01)
>>>
>>> or via its underlying library (that I extracted from oslo.concurrency
>>> and have improved to add more usefulness) @
>>> http://fasteners.readthedocs.org/
>>>
>>> The issue though for<<your favorite openstack project>>  is that each of
>>> these projects now typically has a large amount of lock files that exist
>>> or have existed and no easy way to determine when those lock files can
>>> be deleted (afaik no? periodic task exists in said projects to clean up
>>> lock files, or to delete them when they are no longer in use...) so what
>>> happens is bugs like https://bugs.launchpad.net/cinder/+bug/1432387
>>> appear and there is no a simple solution to clean lock files up (since
>>> oslo.concurrency is really not the right layer to know when a lock can
>>> or can not be deleted, only the application knows that...)
>>>
>>> So then we get a few creative solutions like the following:
>>>
>>> - https://review.openstack.org/#/c/241663/
>>> - https://review.openstack.org/#/c/239678/
>>> - (and others?)
>>>
>>> So I wanted to ask the question, how are people involved in<<your
>>> favorite openstack project>>  cleaning up these files (are they at all?)
>>>
>>> Another idea that I have been proposing also is to use offset locks.
>>>
>>> This would allow for not creating X lock files, but create a *single*
>>> lock file per project and use offsets into it as the way to lock. For
>>> example nova could/would create a 1MB (or larger/smaller) *empty* file
>>> for locks, that would allow for 1,048,576 locks to be used at the same
>>> time, which honestly should be way more than enough, and then there
>>> would not need to be any lock cleanup at all... Is there any reason this
>>> wasn't initially done back way when this lock file code was created?
>>> (https://github.com/harlowja/fasteners/pull/10 adds this functionality
>>> to the underlying library if people want to look it over)
>>
>> I think the main reason was that even with a million locks available,
>> you'd have to find a way to hash the lock names to offsets in the file,
>> and a million isn't a very large collision space for that.  Having two
>> differently named locks that hashed to the same offset would lead to
>> incredibly confusing bugs.
>>
>> We could switch to requiring the projects to provide the offsets instead
>> of hashing a string value, but that's just pushing the collision problem
>> off onto every project that uses us.
>>
>> So that's the problem as I understand it, but where does that leave us
>> for solutions?  First, there's
>> https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L151
>> which allows consumers to delete lock files when they're done with them.
>>   Of course, in that case the onus is on the caller to make sure the lock
>> couldn't possibly be in use anymore.
> 
> Ya, I wonder how many folks are actually doing this, because the exposed 
> API of @synchronized doesn't seem to tell u what file to even delete in 
> the first place :-/ perhaps we should make that more accessible so that 
> people/consumers of that code could know what to delete...

I'm not opposed to allowing users to clean up lock files, although I
think the docstrings for the methods should be very clear that it isn't
strictly necessary and it must be done carefully to avoid deleting
in-use files (the existing docstring is actually insufficient IMHO, but
I'm pretty sure I reviewed it when it went in so I have no one else to
blame ;-).

> 
>>
>> Second, is this actually a problem?  Modern filesystems have absurdly
>> large limits on the number of files in a directory, so it's highly
>> unlikely we would ever exhaust that, and we're creating all zero byte
>> files so there shouldn't be a significant space impact either.  In the
>> past I believe our recommendation has been to simply create a cleanup
>> job that runs on boot, before any of the OpenStack services start, that
>> deletes all of the lock files.  At that point you know it's safe to
>> delete them, and it prevents your lock file directory from growing forever.
> 
> Except as we move to never shutting an app down (always online and live 
> upgrades and all that jazz), it will have to run more than just on boot, 
> but point taken.

Sure, but you're still occasionally going to have to reboot nodes for
kernel updates and such.  The live upgrades work is more about having no
user-visible downtime as I understand it.

> 
>>
>> I know we've had this discussion in the past, but I don't think anyone
>> has ever told me that having lock files hang around was a functional
>> problem for them.  It seems to be largely cosmetic complaints about not
>> cleaning up the old files (which, as you noted, Oslo can't really solve
>> because we have no idea when consumers are finished with locks) and
>> given the amount of trouble we've had with interprocess locking in the
>> past I've never felt that a cosmetic issue was sufficient reason to
>> reopen that can of worms.  I'll just note again that every time we've
>> started messing with this stuff we run into a bunch of sticky problems
>> and edge cases, so it would take a pretty compelling argument to
>> convince me that we should do so again.
>>
>> Of course, if someone wants to take another stab at changing this stuff
>> again I guess more power to them, but to my knowledge we've finally had
>> our interprocess locking in a good state for a while now so I'm not in
>> favor of messing with it.  That's my two cents -- don't spend it all in
>> one place. ;-)
> 
> Fair enough, I can't answer to how much a priority consuming projects 
> put on this, https://bugs.launchpad.net/cinder/+bug/1432387 got marked 
> as 'high' so I assume at least one project doesn't like them but is it 
> tolerable? (unsure?)

Yeah, nothing about that bug description says "high priority" to me, so
I'm not sure why it was triaged that way.  The only complaint is that
there were files left in the cinder state directory, but not that it
actually caused any functional problems.  To me, that makes it a
cosmetic issue.

There's also an unrelated complaint about Cinder using too many locks,
but that's a separate issue having nothing to do with Oslo AFAICT.

> 
>>
>>> In general would like to hear peoples thoughts/ideas/complaints/other,
>>>
>>> -Josh
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list