[openstack-dev] [nova][vmware] Locking: A can of worms
Joshua Harlow
harlowja at yahoo-inc.com
Fri Mar 7 17:53:07 UTC 2014
Tooz folks has been thinking about this problem (as well as myself) for a
little while.
I've started something like: https://review.openstack.org/#/c/71167/
Also: https://wiki.openstack.org/wiki/StructuredWorkflowLocks
Perhaps we can get more movement on that (sorry I haven't had tons of time
to move forward on that review).
Something generic (aka a lock provider that can use different locking
backends to satisfy different desired lock 'requirements') might be useful
for everyone to help avoid these problems? Or at least allow individual
requirements for locks to be managed by a well supported library.
-----Original Message-----
From: Matthew Booth <mbooth at redhat.com>
Organization: Red Hat
Reply-To: "OpenStack Development Mailing List (not for usage questions)"
<openstack-dev at lists.openstack.org>
Date: Friday, March 7, 2014 at 8:53 AM
To: "OpenStack Development Mailing List (not for usage questions)"
<openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [nova][vmware] Locking: A can of worms
>We need locking in the VMware driver. There are 2 questions:
>
>1. How much locking do we need?
>2. Do we need single-node or multi-node locking?
>
>I believe these are quite separate issues, so I'm going to try not to
>confuse them. I'm going to deal with the first question first.
>
>In reviewing the image cache ageing patch, I came across a race
>condition between cache ageing and spawn(). One example of the race is:
>
>Cache Ageing spawn()
>* Check timestamps
> * Delete timestamp
> * Check for image cache directory
>* Delete directory
> * Use image cache directory
>
>This will cause spawn() to explode. There are various permutations of
>this. For example, the following are all possible:
>
>* A simple failure of spawn() with no additional consequences.
>
>* Calling volumeops.attach_disk_to_vm() with a vmdk_path that doesn't
>exist. It's not 100% clear to me that ReconfigVM_Task will throw an
>error in this case, btw, which would probably be bad.
>
>* Leaving a partially deleted image cache directory which doesn't
>contain the base image. This would be really bad.
>
>The last comes about because recursive directory delete isn't atomic,
>and may partially succeed, which is a tricky problem. However, in
>discussion, Gary also pointed out that directory moves are not atomic
>(see MoveDatastoreFile_Task). This is positively nasty. We already knew
>that spawn() races with itself to create an image cache directory, and
>we've hit this problem in practise. We haven't fixed the race, but we do
>manage it. The management relies on the atomicity of a directory move.
>Unfortunately it isn't atomic, which presents the potential problem of
>spawn() attempting to use an incomplete image cache directory. We also
>have the problem of 2 spawns using a linked clone image racing to create
>the same resized copy.
>
>We could go through all of the above very carefully to assure ourselves
>that we've found all the possible failure paths, and that in every case
>the problems are manageable and documented. However, I would place a
>good bet that the above is far from a complete list, and we would have
>to revisit it in its entirety every time we touched any affected code.
>And that would be a lot of code.
>
>We need something to manage concurrent access to resources. In all of
>the above cases, if we make the rule that everything which uses an image
>cache directory must hold a lock on it whilst using it, all of the above
>problems go away. Reasoning about their correctness becomes the
>comparatively simple matter of ensuring that the lock is used correctly.
>Note that we need locking in both the single and multi node cases,
>because even single node is multi-threaded.
>
>The next question is whether that locking needs to be single node or
>multi node. Specifically, do we currently, or do we plan to, allow an
>architecture where multiple Nova nodes access the same datastore
>concurrently. If we do, then we need to find a distributed locking
>solution. Ideally this would use the datastore itself for lock
>mediation. Failing that, apparently this tool is used elsewhere within
>the project:
>
>http://zookeeper.apache.org/doc/trunk/zookeeperOver.html
>
>That would be an added layer of architecture and deployment complexity,
>but if we need it, it's there.
>
>If we can confidently say that 2 Nova instances should never be
>accessing the same datastore (how about hot/warm/cold failover?), we can
>use Nova's internal synchronisation tools. This would simplify matters
>greatly!
>
>I think this is one of those areas which is going to improve both the
>quality of the driver, and the confidence of reviewers to merge changes.
>Right now it takes a lot of brain cycles to work through all the various
>paths of a race to work out if any of them are really bad, and it has to
>be repeated every time you touch the code. A little up-front effort will
>make a whole class of problems go away.
>
>Matt
>--
>Matthew Booth, RHCA, RHCSS
>Red Hat Engineering, Virtualisation Team
>
>GPG ID: D33C3490
>GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list