[openstack-dev] [tempest][nova][cinder] Tests that try to detach volumes in use
Peter Penchev
openstack-dev at storpool.com
Fri Apr 24 15:40:15 UTC 2015
Hi,
There are a couple of Tempest volume tests, like
test_rescued_vm_detach_volume or test_list_get_volume_attachments,
that either sometimes[0] or always attempt to detach a volume from a
running instance while the instance could still be keeping it open.
Unfortunately, this is not completely compatible with the StorPool
distributed storage driver - in a StorPool cluster, a volume may only
be detached from a client host (the Nova hypervisor) if there are no
processes running on the host (e.g. qemu) that keep the volume open.
This came about as a result of a series of Linux kernel crashes that
we observed during our testing when a volume containing a filesystem
was detached while the kernel's filesystem driver didn't expect it to.
Right now, our driver for attaching StorPool volumes (defined in
Cinder) to Nova instances (proposed in
http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/libvirt-storpool-volume-attach.html
yet didn't get enough +1/+2's in time for Kilo RC2) tries to detach
the volume, then waits for a couple of seconds, hoping that any
processes would have been notified to let it go, then tries again,
then fails. Of course, StorPool has a "force detach" option that
could be used in that case; the problem there is that it might indeed
lead to some trouble for the instances that will have the volume
pulled out from under their tiny instance legs. This could go in the
"let the operator handle it" category - if we're detaching a volume,
this supposedly means that the filesystem has already been unmounted
within the instance... is this a sensible approach? Should we teach
our driver to forcibly detach the volume if the second polite attempt
still fails?
G'luck,
Peter
[0] The "sometimes" part: it seems that in some tests, like
test_list_get_volume_attachments, the order of the "detach volume" and
"stop the running instance" actions is random, dependent on the order
in which the Python test framework will execute the cleanup handlers.
Of course, it might be that I'm misunderstanding something and it is
completely deterministic and there is another issue at hand...
More information about the OpenStack-dev
mailing list