[Openstack] Locking problems with NFS v3 and shared /var/lib/nova/instances

Antonio Messina antonio.s.messina at gmail.com
Tue Feb 4 09:23:01 UTC 2014


Hi all,

I am trying to configure a few compute nodes (currently Folsom) with a
shared NFS filesystem to store the virtual machine images. The idea is
to enable live migration, but we also need it because some of the
compute nodes have a very small hard disk.

The NFS server we are using is an appliance, but it's actually based
on CentOS 6.2. However, I only have access to a web interface which is
quite limited, so I cannot directly edit /etc/exports, nor checking
the logs, and it only supports NFS v3.

The problem I am facing is that when I try to spawn multiple instances
on different nodes, all accessing the same NFS filesystem, many of
them will fail with the following error::

    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py",
line 756, in _spawn
        block_device_info)
      File "/usr/lib/python2.7/dist-packages/nova/exception.py", line
117, in wrapped
        temp_level, payload)
      File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
        self.gen.next()
      File "/usr/lib/python2.7/dist-packages/nova/exception.py", line
92, in wrapped
        return f(*args, **kw)
      File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py",
line 1099, in spawn
        admin_pass=admin_password)
      File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py",
line 1365, in _create_image
        project_id=instance['project_id'])
      File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py",
line 131, in cache
        *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py",
line 178, in create_image
        prepare_template(target=base, *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 791, in inner
        with lock:
      File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 664,
in __enter__
        self.trylock()
      File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 699,
in trylock
        fcntl.lockf(self.lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
    IOError: [Errno 37] No locks available

It seems that before trying to download a new base image from the glance
storage, nova-compute will create a file in
``/var/lib/nova/instances/locks`` named after the ID of the image, and
then it calls ``fcntl()`` in order to ensure that only one
nova-compute process is actually downloading the image and writing in
``/var/lib/nova/instances/_base``.

However, locking in NFS v3 is quite unreliable, especially when you
execute many ``fcntl()`` at the same time, as it seems to happen in
``/usr/lib/python2.7/dist-packages/nova/utils.py:670``::

        while True:
            try:
                # Using non-blocking locks since green threads are not
                # patched to deal with blocking locking calls.
                # Also upon reading the MSDN docs for locking(), it seems
                # to have a laughable 10 attempts "blocking" mechanism.
                self.trylock()
                return self
            except IOError, e:
                if e.errno in (errno.EACCES, errno.EAGAIN):
                    # external locks synchronise things like iptables
                    # updates - give it some time to prevent busy spinning
                    time.sleep(0.01)
                else:
                    raise

The only solutions I can think of are:

* use NFS v4 instead of v3 (but I would like to use the storage
  appliance I'm currently using, which doesn't support v4)

* mount a NFS v4 filesystem in /var/lib/nova/instances/locks (or any
  other filesystem with decent locking)

* use a differenet shared filesystem (glusterfs?)

So I am asking whoever already deployed OpenStack with a shared
filesystem for /var/lib/nova/instances if you faced the same problems
and how you solved them.

Thank you
Antonio Messina

-- 
antonio.s.messina at gmail.com
antonio.messina at uzh.ch                     +41 (0)44 635 42 22
GC3: Grid Computing Competence Center      http://www.gc3.uzh.ch/
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich Switzerland




More information about the Openstack mailing list