[os-brick][nova][cinder][glance] Question about lock_path
Hello Fellow OpenStackers! I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue? [1] https://review.opendev.org/c/openstack/os-brick/+/814139 Kind regards, -yoctozepto
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick. Cheers, gibi
[1] https://review.opendev.org/c/openstack/os-brick/+/814139
Kind regards, -yoctozepto
General question: Are file locks even sufficient for this? If we're talking about multiple services coordinating access to a resource, do we know for certain that all of the processes accessing the resource are on the same system? If not, file locks can't solve this anyway. If so, see my thoughts below. On 2/1/22 09:57, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick.
The lock_path config opt is owned by oslo.concurrency so I don't think this is possible today. The simplest and least error-prone solution is to configure all of the users of os-brick to use the same lock path. Even if we provided a way to override the lock path for just os-brick, you'd still need to add a config opt to each service that points at a common location and the deployment tools would need to configure that and create the shared directory. You don't save the deployment tools any effort this way. Keeping a single lock path for everything running in a given service also eliminates the possibility that a lock gets used both in and out of os-brick, which would break if os-brick had a separate path. I don't know if that happens, but a single lock path means you never have to answer that question. One drawback I can see to a single lock path for multiple services is possible collisions of lock names. Previously each service could name locks whatever they wanted as long as there were no duplicates within the service. If they start sharing lock path, there can be no duplicates between any of the services sharing the lock path. I can't say how much of a problem this would be, but we could probably come up with some way to look at the lock names used in a service and make sure there's no overlap. Also, I _think_ even if there were a duplicate lock name in two services, the worst that would happen is some unnecessary lock contention.* You might introduce a performance problem, but not a race. Personally, that's a tradeoff I'd happily make. *: If you had multiple duplicate locks being acquired in different orders between the two services, it's possible you could introduce a deadlock. A lot has to go wrong for that to happen, but in the interest of completeness it should be considered. Have I mentioned that concurrency is hard? ;-)
On Tue, Feb 1 2022 at 11:07:07 AM -0600, Ben Nemec <openstack@nemebean.com> wrote:
General question: Are file locks even sufficient for this? If we're talking about multiple services coordinating access to a resource, do we know for certain that all of the processes accessing the resource are on the same system? If not, file locks can't solve this anyway.
If so, see my thoughts below.
On 2/1/22 09:57, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick.
The lock_path config opt is owned by oslo.concurrency so I don't think this is possible today. The simplest and least error-prone solution is to configure all of the users of os-brick to use the same lock path.
The oslo_concurrency.lockutils.lock() call takes a lock_path kwargs that can be used to override the configured lock_path as far as I see[1][2][3].
Even if we provided a way to override the lock path for just os-brick, you'd still need to add a config opt to each service that points at a common location and the deployment tools would need to configure that and create the shared directory. You don't save the deployment tools any effort this way.
os_brick can expose a new config option with a sane default and document that if that default is changed then the config of every service using os_brick needs to be changed to have the same value.
Keeping a single lock path for everything running in a given service also eliminates the possibility that a lock gets used both in and out of os-brick, which would break if os-brick had a separate path. I don't know if that happens, but a single lock path means you never have to answer that question.
Good point. That is a clear advantage of this proposal.
One drawback I can see to a single lock path for multiple services is possible collisions of lock names. Previously each service could name locks whatever they wanted as long as there were no duplicates within the service. If they start sharing lock path, there can be no duplicates between any of the services sharing the lock path. I can't say how much of a problem this would be, but we could probably come up with some way to look at the lock names used in a service and make sure there's no overlap.
Also, I _think_ even if there were a duplicate lock name in two services, the worst that would happen is some unnecessary lock contention.* You might introduce a performance problem, but not a race. Personally, that's a tradeoff I'd happily make.
*: If you had multiple duplicate locks being acquired in different orders between the two services, it's possible you could introduce a deadlock. A lot has to go wrong for that to happen, but in the interest of completeness it should be considered. Have I mentioned that concurrency is hard? ;-)
[1] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [2] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [3] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11...
On 01/02, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:07:07 AM -0600, Ben Nemec <openstack@nemebean.com> wrote:
General question: Are file locks even sufficient for this? If we're talking about multiple services coordinating access to a resource, do we know for certain that all of the processes accessing the resource are on the same system? If not, file locks can't solve this anyway.
Hi, File locks are sufficient and also necessary for some connector types. I don't want to get too much into it, but these file locks are only trying to create a critical section within a node, not deployment wide. For example, we know that the Linux Open-iSCSI initiator has lots of problems with concurrent access (due to the persistent configuration database), so we only want to make sure that there are no 2 concurrent calls to it.
If so, see my thoughts below.
On 2/1/22 09:57, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick.
The lock_path config opt is owned by oslo.concurrency so I don't think this is possible today. The simplest and least error-prone solution is to configure all of the users of os-brick to use the same lock path.
The oslo_concurrency.lockutils.lock() call takes a lock_path kwargs that can be used to override the configured lock_path as far as I see[1][2][3].
Even if we provided a way to override the lock path for just os-brick, you'd still need to add a config opt to each service that points at a common location and the deployment tools would need to configure that and create the shared directory. You don't save the deployment tools any effort this way.
I agree, that's why I didn't add a new one when I developed that patch. In retrospective I should have added to the commit message and release notes the cases where admins/deployment tools need to use the same lock_path.
os_brick can expose a new config option with a sane default and document that if that default is changed then the config of every service using os_brick needs to be changed to have the same value.
Keeping a single lock path for everything running in a given service also eliminates the possibility that a lock gets used both in and out of os-brick, which would break if os-brick had a separate path. I don't know if that happens, but a single lock path means you never have to answer that question.
Good point. That is a clear advantage of this proposal.
One drawback I can see to a single lock path for multiple services is possible collisions of lock names. Previously each service could name locks whatever they wanted as long as there were no duplicates within the service. If they start sharing lock path, there can be no duplicates between any of the services sharing the lock path. I can't say how much of a problem this would be, but we could probably come up with some way to look at the lock names used in a service and make sure there's no overlap.
Also, I _think_ even if there were a duplicate lock name in two services, the worst that would happen is some unnecessary lock contention.* You might introduce a performance problem, but not a race. Personally, that's a tradeoff I'd happily make.
*: If you had multiple duplicate locks being acquired in different orders between the two services, it's possible you could introduce a deadlock. A lot has to go wrong for that to happen, but in the interest of completeness it should be considered. Have I mentioned that concurrency is hard? ;-)
There shouldn't be any unintended lock name collision or lock contention, because services use a service specific prefix for their own locks. At least Cinder [1] and Nova [2] do. OS-Brick doesn't use those prefixed locks but bare locks so they are shared between services, as intended. Cheers, Gorka. [1]: https://github.com/openstack/cinder/blob/5cf395b38544b2e96bdf625b3aae66f0b28... [2]: https://github.com/openstack/nova/blob/b0633ac49bfbe2e0e51ef0506f626d2dbf7c1...
[1] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [2] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [3] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11...
Thank you also for all these clarifications. Cheers, -yoctozepto On Thu, 3 Feb 2022 at 11:12, Gorka Eguileor <geguileo@redhat.com> wrote:
On 01/02, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:07:07 AM -0600, Ben Nemec <openstack@nemebean.com> wrote:
General question: Are file locks even sufficient for this? If we're talking about multiple services coordinating access to a resource, do we know for certain that all of the processes accessing the resource are on the same system? If not, file locks can't solve this anyway.
Hi,
File locks are sufficient and also necessary for some connector types.
I don't want to get too much into it, but these file locks are only trying to create a critical section within a node, not deployment wide.
For example, we know that the Linux Open-iSCSI initiator has lots of problems with concurrent access (due to the persistent configuration database), so we only want to make sure that there are no 2 concurrent calls to it.
If so, see my thoughts below.
On 2/1/22 09:57, Balazs Gibizer wrote:
On Tue, Feb 1 2022 at 11:50:28 AM +0100, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
An alternative to that would be that os-brick has specific lock path config option regardless of which service is importing the lib. Then the oslo_concurrency.lockutils.lock() can be called with that os-brick specific path from os-brick.
The lock_path config opt is owned by oslo.concurrency so I don't think this is possible today. The simplest and least error-prone solution is to configure all of the users of os-brick to use the same lock path.
The oslo_concurrency.lockutils.lock() call takes a lock_path kwargs that can be used to override the configured lock_path as far as I see[1][2][3].
Even if we provided a way to override the lock path for just os-brick, you'd still need to add a config opt to each service that points at a common location and the deployment tools would need to configure that and create the shared directory. You don't save the deployment tools any effort this way.
I agree, that's why I didn't add a new one when I developed that patch.
In retrospective I should have added to the commit message and release notes the cases where admins/deployment tools need to use the same lock_path.
os_brick can expose a new config option with a sane default and document that if that default is changed then the config of every service using os_brick needs to be changed to have the same value.
Keeping a single lock path for everything running in a given service also eliminates the possibility that a lock gets used both in and out of os-brick, which would break if os-brick had a separate path. I don't know if that happens, but a single lock path means you never have to answer that question.
Good point. That is a clear advantage of this proposal.
One drawback I can see to a single lock path for multiple services is possible collisions of lock names. Previously each service could name locks whatever they wanted as long as there were no duplicates within the service. If they start sharing lock path, there can be no duplicates between any of the services sharing the lock path. I can't say how much of a problem this would be, but we could probably come up with some way to look at the lock names used in a service and make sure there's no overlap.
Also, I _think_ even if there were a duplicate lock name in two services, the worst that would happen is some unnecessary lock contention.* You might introduce a performance problem, but not a race. Personally, that's a tradeoff I'd happily make.
*: If you had multiple duplicate locks being acquired in different orders between the two services, it's possible you could introduce a deadlock. A lot has to go wrong for that to happen, but in the interest of completeness it should be considered. Have I mentioned that concurrency is hard? ;-)
There shouldn't be any unintended lock name collision or lock contention, because services use a service specific prefix for their own locks. At least Cinder [1] and Nova [2] do. OS-Brick doesn't use those prefixed locks but bare locks so they are shared between services, as intended.
Cheers, Gorka.
[1]: https://github.com/openstack/cinder/blob/5cf395b38544b2e96bdf625b3aae66f0b28... [2]: https://github.com/openstack/nova/blob/b0633ac49bfbe2e0e51ef0506f626d2dbf7c1...
[1] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [2] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11... [3] https://github.com/openstack/oslo.concurrency/blob/95b9334cfab6849fbe47e2b11...
On 01/02, Radosław Piliszek wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
[1] https://review.opendev.org/c/openstack/os-brick/+/814139
Kind regards, -yoctozepto
Hi, Yes, it is the deployment tool's responsibility to set the right lock_path value on the following cases: - Hyperconverged deployment: Cinder-volume or cinder-backup are running on the same node as thee nova-compute service. - Glance is using Cinder as a backend. In all other cases it doesn't matter the value of the lock_path since there only 1 project that uses os-brick, even if it has multiple processes and/or services using it as is the case in cinder-volume and cinder-backup running on the same host or cinder-backup running multiple processes. Cheers, Gorka.
On Thu, 3 Feb 2022 at 10:53, Gorka Eguileor <geguileo@redhat.com> wrote:
On 01/02, Radosław Piliszek wrote:
Hello Fellow OpenStackers!
I have noticed os-brick started using the file locks (in lock_path) to avoid race conditions. I have a question regarding that lock_path - it seems all the cases I have found use separate lock_paths per service, yet the problematic cases listed in the bug report (and the commit message of [1]) suggest that, in case of co-hosted nova-compute, glance-api, cinder-volume and cinder-backup services (or any combination thereof), the lock_path should be shared. Should the deployment projects adapt to actually fix the race condition issue?
[1] https://review.opendev.org/c/openstack/os-brick/+/814139
Kind regards, -yoctozepto
Hi,
Yes, it is the deployment tool's responsibility to set the right lock_path value on the following cases:
- Hyperconverged deployment: Cinder-volume or cinder-backup are running on the same node as thee nova-compute service.
- Glance is using Cinder as a backend.
Thank you, Gorka, for confirming.
In all other cases it doesn't matter the value of the lock_path since there only 1 project that uses os-brick, even if it has multiple processes and/or services using it as is the case in cinder-volume and cinder-backup running on the same host or cinder-backup running multiple processes.
Yeah, that I figured out. :-) Cheers, -yoctozepto
Cheers, Gorka.
participants (4)
-
Balazs Gibizer
-
Ben Nemec
-
Gorka Eguileor
-
Radosław Piliszek