On Tue, Feb 1 2022 at 11:07:07 AM -0600, Ben Nemec <openstack@nemebean.com> wrote:
General question: Are file locks even sufficient for this? If we're talking about multiple services coordinating access to a resource, do we know for certain that all of the processes accessing the resource are on the same system? If not, file locks can't solve this anyway.
If so, see my thoughts below.
On 2/1/22 09:57, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick.
The lock_path config opt is owned by oslo.concurrency so I don't think this is possible today. The simplest and least error-prone solution is to configure all of the users of os-brick to use the same lock path.
The oslo_concurrency.lockutils.lock() call takes a lock_path kwargs that can be used to override the configured lock_path as far as I see[1][2][3].
Even if we provided a way to override the lock path for just os-brick, you'd still need to add a config opt to each service that points at a common location and the deployment tools would need to configure that and create the shared directory. You don't save the deployment tools any effort this way.
os_brick can expose a new config option with a sane default and document that if that default is changed then the config of every service using os_brick needs to be changed to have the same value.
Keeping a single lock path for everything running in a given service also eliminates the possibility that a lock gets used both in and out of os-brick, which would break if os-brick had a separate path. I don't know if that happens, but a single lock path means you never have to answer that question.
Good point. That is a clear advantage of this proposal.
One drawback I can see to a single lock path for multiple services is possible collisions of lock names. Previously each service could name locks whatever they wanted as long as there were no duplicates within the service. If they start sharing lock path, there can be no duplicates between any of the services sharing the lock path. I can't say how much of a problem this would be, but we could probably come up with some way to look at the lock names used in a service and make sure there's no overlap.
Also, I _think_ even if there were a duplicate lock name in two services, the worst that would happen is some unnecessary lock contention.* You might introduce a performance problem, but not a race. Personally, that's a tradeoff I'd happily make.
*: If you had multiple duplicate locks being acquired in different orders between the two services, it's possible you could introduce a deadlock. A lot has to go wrong for that to happen, but in the interest of completeness it should be considered. Have I mentioned that concurrency is hard? ;-)
[1] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [2] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [3] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11...