Open Stack

Mon Jun 29 17:58:27 UTC 2015

On Mon, Jun 29, 2015 at 03:45:56PM +0300, Duncan Thomas wrote:
> On 29 June 2015 at 15:23, Dulko, Michal <michal.dulko at intel.com> wrote:
> 
> >  There’s also some similar situations when we actually don’t lock on
> > resources. For  example – a cgsnapshot may get deleted while creating a
> > consistencygroup from it.
> >
> >
> >
> > From my perspective it seems best to have atomic state changes and
> > state-based exclusion in API. We would need some kind of
> > currently_used_to_create_snapshot/volums/consistencygroups states to
> > achieve that. Then we would be also able to return VolumeIsBusy exceptions
> > so retrying a request would be on the user side.
> >
> >
> >
> I'd agree, except that gives quite a big behaviour change in the
> tenant-facing API, which will break clients and scripts. Not sure how to
> square that circle... I'd say V3 API except Mike might kill me...

I'd prefer not to add another item to the list of things to get HA, much
less one on the scale of a new version.

As far as I can see, we have 3 cases where we use or need to use locks:

1- Locking multiple writing access to a resource
2- Prevent modification of a resource being used for reading
3- Backend drivers

1- Locking multiple writing access to a resource
These locks can most likely be avoided if we implement atomic state
changes (with compare-and-swap) and use current state to prevent
multiple writes on the same resource, since writes change the status of
the resource.  There's already a spec proposing this [1].

2- Prevent modification of a resource in read use
I only see 2 options here:

- Limit numbers of readers to 1 and use Tooz's Locks as DLM. This would
  be implemented quite easily, although it would not be very efficient.
- Implement shared locks in Tooz or in DB.  One way to implement this in
  the DB would be to add a field with a counter of tasks currently using
  the resource for reading.  Modifications to this counter would use a
  compare and swap to check the status when increasing the counter and
  doing the increase on the DB instead of doing it in the Cinder node.
  Status changes would also work with compare-and-swap and besides
  checking current status for availability it would check the counter to
  be 0.

The drawback of the DB implementation is that an aborted operation would
be locking the resource.  But it could be solved if we use TaskFlow for
operations and on the revert method we decrement the counter.  One big
advantage is that we don't need heartbeats to be periodically sent to
prevent locks from being released and it's easy to pass the lock from
the API to the Volume node.

If we implement this in Tooz we could start implementing it in only 1
driver and recommend only using that until the rest are available.

3- Backend drivers
Depending on the drivers they could not need locks, or they could do with
file locks local to the node (since Cinder would be preventing multiple
write access to the same resource) or they may need a DLM if they need,
for example, to prevent simultaneous operations on the same pool from
different nodes.

For this case Tooz would be the best solution, since drivers should not
access the DB and Tooz allows using file locks as well as distributed
locking.

Cheers,
Gorka

[1]: https://review.openstack.org/#/c/149894/

Open Stack

[openstack-dev] [cinder][oslo] Locks for create from volume/snapshot

OpenStack

Community

Documentation

Branding & Legal