[openstack-dev] [nova][vmware] Locking: A can of worms

Matthew Booth mbooth at redhat.com
Fri Mar 7 16:53:26 UTC 2014


We need locking in the VMware driver. There are 2 questions:

1. How much locking do we need?
2. Do we need single-node or multi-node locking?

I believe these are quite separate issues, so I'm going to try not to
confuse them. I'm going to deal with the first question first.

In reviewing the image cache ageing patch, I came across a race
condition between cache ageing and spawn(). One example of the race is:

Cache Ageing                spawn()
* Check timestamps
                            * Delete timestamp
                            * Check for image cache directory
* Delete directory
                            * Use image cache directory

This will cause spawn() to explode. There are various permutations of
this. For example, the following are all possible:

* A simple failure of spawn() with no additional consequences.

* Calling volumeops.attach_disk_to_vm() with a vmdk_path that doesn't
exist. It's not 100% clear to me that ReconfigVM_Task will throw an
error in this case, btw, which would probably be bad.

* Leaving a partially deleted image cache directory which doesn't
contain the base image. This would be really bad.

The last comes about because recursive directory delete isn't atomic,
and may partially succeed, which is a tricky problem. However, in
discussion, Gary also pointed out that directory moves are not atomic
(see MoveDatastoreFile_Task). This is positively nasty. We already knew
that spawn() races with itself to create an image cache directory, and
we've hit this problem in practise. We haven't fixed the race, but we do
manage it. The management relies on the atomicity of a directory move.
Unfortunately it isn't atomic, which presents the potential problem of
spawn() attempting to use an incomplete image cache directory. We also
have the problem of 2 spawns using a linked clone image racing to create
the same resized copy.

We could go through all of the above very carefully to assure ourselves
that we've found all the possible failure paths, and that in every case
the problems are manageable and documented. However, I would place a
good bet that the above is far from a complete list, and we would have
to revisit it in its entirety every time we touched any affected code.
And that would be a lot of code.

We need something to manage concurrent access to resources. In all of
the above cases, if we make the rule that everything which uses an image
cache directory must hold a lock on it whilst using it, all of the above
problems go away. Reasoning about their correctness becomes the
comparatively simple matter of ensuring that the lock is used correctly.
Note that we need locking in both the single and multi node cases,
because even single node is multi-threaded.

The next question is whether that locking needs to be single node or
multi node. Specifically, do we currently, or do we plan to, allow an
architecture where multiple Nova nodes access the same datastore
concurrently. If we do, then we need to find a distributed locking
solution. Ideally this would use the datastore itself for lock
mediation. Failing that, apparently this tool is used elsewhere within
the project:

http://zookeeper.apache.org/doc/trunk/zookeeperOver.html

That would be an added layer of architecture and deployment complexity,
but if we need it, it's there.

If we can confidently say that 2 Nova instances should never be
accessing the same datastore (how about hot/warm/cold failover?), we can
use Nova's internal synchronisation tools. This would simplify matters
greatly!

I think this is one of those areas which is going to improve both the
quality of the driver, and the confidence of reviewers to merge changes.
Right now it takes a lot of brain cycles to work through all the various
paths of a race to work out if any of them are really bad, and it has to
be repeated every time you touch the code. A little up-front effort will
make a whole class of problems go away.

Matt
-- 
Matthew Booth, RHCA, RHCSS
Red Hat Engineering, Virtualisation Team

GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



More information about the OpenStack-dev mailing list