[openstack-dev] [nova][vmware] Locking: A can of worms

Gary Kotton gkotton at vmware.com
Sat Mar 8 11:07:45 UTC 2014

I have provided a solution for the single node setup that. That is, we
make use of the nova lock utilities to provide locking so that that the
cached images will not be removed whilst an instance is spawned. I hope
that this will unblock the feature an enable us to continue with the FFE.
The reasons for this are as follow:
1. The feature can be disable if a user wishes
2. In the case the user wants to make use of multiple compute nodes we can
address the issues via configuration variables, that is, each compute node
can use a different cache directory on the datastore.

On 3/7/14 11:03 PM, "Vui Chiap Lam" <vuichiap at vmware.com> wrote:

>Thanks Matt, good points raised here.
>I think at minimum we should address the single-node case by
>all access to the image cache data on the datastore, in a way that is
>easy to reason about. Spawn failures are not desirable but acceptable in
>cases if a subsequent retry on same or different node can succeed. What
>we need
>to absolutely avoid is a partially deleted image cache directory rendering
>future spawns impossible.
>There are some non-elegant ways to work around the multi-node scenarios
>if one
>really has to (e.g. dedicating a different cache location for each node),
>i think the multi-node should get addressed at some point too.
>There is distributing locking provisions in the a vSphere datastore for
>in use (you cannot delete a disk used by a running VM, say). The image
>cache as
>implemented exists as a collection of files on a datastore, many of which
>not be used by any process. Either that picture changes or we need to
>some other means to achieve distributed locking.
>----- Original Message -----
>| From: "Matthew Booth" <mbooth at redhat.com>
>| To: "OpenStack Development Mailing List (not for usage questions)"
><openstack-dev at lists.openstack.org>
>| Sent: Friday, March 7, 2014 8:53:26 AM
>| Subject: [openstack-dev] [nova][vmware] Locking: A can of worms
>| We need locking in the VMware driver. There are 2 questions:
>| 1. How much locking do we need?
>| 2. Do we need single-node or multi-node locking?
>| I believe these are quite separate issues, so I'm going to try not to
>| confuse them. I'm going to deal with the first question first.
>| In reviewing the image cache ageing patch, I came across a race
>| condition between cache ageing and spawn(). One example of the race is:
>| Cache Ageing                spawn()
>| * Check timestamps
>|                             * Delete timestamp
>|                             * Check for image cache directory
>| * Delete directory
>|                             * Use image cache directory
>| This will cause spawn() to explode. There are various permutations of
>| this. For example, the following are all possible:
>| * A simple failure of spawn() with no additional consequences.
>| * Calling volumeops.attach_disk_to_vm() with a vmdk_path that doesn't
>| exist. It's not 100% clear to me that ReconfigVM_Task will throw an
>| error in this case, btw, which would probably be bad.
>| * Leaving a partially deleted image cache directory which doesn't
>| contain the base image. This would be really bad.
>| The last comes about because recursive directory delete isn't atomic,
>| and may partially succeed, which is a tricky problem. However, in
>| discussion, Gary also pointed out that directory moves are not atomic
>| (see MoveDatastoreFile_Task). This is positively nasty. We already knew
>| that spawn() races with itself to create an image cache directory, and
>| we've hit this problem in practise. We haven't fixed the race, but we do
>| manage it. The management relies on the atomicity of a directory move.
>| Unfortunately it isn't atomic, which presents the potential problem of
>| spawn() attempting to use an incomplete image cache directory. We also
>| have the problem of 2 spawns using a linked clone image racing to create
>| the same resized copy.
>| We could go through all of the above very carefully to assure ourselves
>| that we've found all the possible failure paths, and that in every case
>| the problems are manageable and documented. However, I would place a
>| good bet that the above is far from a complete list, and we would have
>| to revisit it in its entirety every time we touched any affected code.
>| And that would be a lot of code.
>| We need something to manage concurrent access to resources. In all of
>| the above cases, if we make the rule that everything which uses an image
>| cache directory must hold a lock on it whilst using it, all of the above
>| problems go away. Reasoning about their correctness becomes the
>| comparatively simple matter of ensuring that the lock is used correctly.
>| Note that we need locking in both the single and multi node cases,
>| because even single node is multi-threaded.
>| The next question is whether that locking needs to be single node or
>| multi node. Specifically, do we currently, or do we plan to, allow an
>| architecture where multiple Nova nodes access the same datastore
>| concurrently. If we do, then we need to find a distributed locking
>| solution. Ideally this would use the datastore itself for lock
>| mediation. Failing that, apparently this tool is used elsewhere within
>| the project:
>| That would be an added layer of architecture and deployment complexity,
>| but if we need it, it's there.
>| If we can confidently say that 2 Nova instances should never be
>| accessing the same datastore (how about hot/warm/cold failover?), we can
>| use Nova's internal synchronisation tools. This would simplify matters
>| greatly!
>| I think this is one of those areas which is going to improve both the
>| quality of the driver, and the confidence of reviewers to merge changes.
>| Right now it takes a lot of brain cycles to work through all the various
>| paths of a race to work out if any of them are really bad, and it has to
>| be repeated every time you touch the code. A little up-front effort will
>| make a whole class of problems go away.
>| Matt
>| --
>| Matthew Booth, RHCA, RHCSS
>| Red Hat Engineering, Virtualisation Team
>| GPG ID:  D33C3490
>| GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>| _______________________________________________
>| OpenStack-dev mailing list
>| OpenStack-dev at lists.openstack.org
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org

More information about the OpenStack-dev mailing list