[openstack-dev] [cinder][nova] condition needed to detach a volume

thelo.gaultier at itri.org.tw thelo.gaultier at itri.org.tw
Thu Feb 26 09:46:51 UTC 2015


I am developing as cinder plugin and I recently faced one problem while running the cinder driver's certification test ( devstack/driver_certs/cinder_driver_cert.sh )

Here is what happened and my understanding of the problem (correct me if I am wrong)

>From the test logs I can see the following information:

{2} tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments [238.283153s] ... FAILED
.....
Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "tempest/api/compute/volumes/test_attach_volume.py", line 46, in _detach
        self.volumes_client.wait_for_volume_status(volume_id, 'available')
      File "tempest/services/volume/json/volumes_client.py", line 182, in wait_for_volume_status
        raise exceptions.TimeoutException(message)
    TimeoutException: Request timed out
    Details: Volume bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a failed to reach available status (current: detaching) within the required time (196 s).
    Traceback (most recent call last):
    _StringException: Empty attachments:
      stderr
      stdout

when I check what the test case actually does, I see the following

44         if self.attachment:
45             self.servers_client.detach_volume(server_id, volume_id)
46             self.volumes_client.wait_for_volume_status(volume_id, 'available')

So the volume failed to detach within the timeout period.

To do simple the test that failed does the following:

1 attach a volume
2 wait for the volume to be attached ( by checking the volume status to be 'in-use')
3 detach the volume
4 wait for the volume to be detached ( by checking the volume status to be 'available')

In nova-compute's log I see an Exception that seems to be the reason why we get this timeout. This happens when the detach request is received by nova-compute:

1226617609] Exception during message handling: <type 'NoneType'> can't be decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 414, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     payload)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/exception.py", line 71, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 298, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     pass
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 283, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 326, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 4730, in detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     self._detach_volume(context, instance, bdm)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 4671, in _detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     connection_info = jsonutils.loads(bdm.connection_info)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 213, in loads
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     raise TypeError("%s can't be decoded" % type(text))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher

When I run only this specific test case, the problem does not occur.

Hence I added some logs in nova-compute to find out the difference between these two executions. From those, I can see that the connection_info field is None when the Error is triggered.

This log is printed when nova receive the detach request ( just before parsing the connection_info field ).
2015-02-26 13:50:00.572 AUDIT nova.compute.manager [req-9930623e-c073-4c5b-bcaf-f4098245277f AttachVolumeTestJSON-1822846528 AttachVolumeTestJSON-1226617609] print connection info : BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-02-26T05:49:36Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=166,image_id=None,instance=<?>,instance_uuid=9aeea261-30af-46eb-9fbb-53d03b65b43f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',volume_size=None)

Then I tried to figure out why this field was None and went back to the previous step : attach volume. From my basic understanding, when nova attach a volume, it will update two things ( source : nova/virt/libvirt/block_device.py)

-          1 The volume status ( in cinder's database )

-          2 The connection_info ( after the call to virt_driver.attach_volume )
>From the logs in cinder (c-api.log )

The first time the volume status is mentioned as 'in-use' is at 13:49:59.491 in (cinder-api's log)
2-26 13:49:59.491 31396 INFO cinder.api.v1.volumes [req-dc56cd34-8ae3-42b0-8c35-8c01cd142955 690795b5c8c74e4fb6d292eeebef12d5 b49a40aa157e460592f3e3ac9a9f16eb - - -] vol={'migration_status': None, 'provider_id': None, 'availability_zone': u'nova', 'terminated_at': None, 'updated_at': datetime.datetime(2015, 2, 26, 5, 49, 58), 'provider_geometry': None, 'replication_extended_status': None, 'replication_status': u'disabled', 'snapshot_id': None, 'ec2_id': None, 'mountpoint': u'/dev/vdb', 'deleted_at': None, 'id': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'size': 1L, 'user_id': u'690795b5c8c74e4fb6d292eeebef12d5', 'attach_time': u'2015-02-26T05:49:52.128113', 'attached_host': None, 'display_description': None, 'volume_admin_metadata': u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ed8cd10>, <cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ef789d0>]', 'project_id': u'b49a40aa157e460592f3e3ac9a9f16eb', 'launched_at': datetime.datetime(2015, 2, 26, 5, 49, 35), 'scheduled_at': datetime.datetime(2015, 2, 26, 5, 49, 33), 'status': u'in-use', 'volume_type_id': None, 'deleted': False, 'provider_location': u'999993877', 'host': u'cwisr3s18#disco', 'consistencygroup_id': None, 'source_volid': None, 'provider_auth': None, 'display_name': u'test', 'instance_uuid': u'9aeea261-30af-46eb-9fbb-53d03b65b43f', 'bootable': False, 'created_at': datetime.datetime(2015, 2, 26, 5, 49, 32), 'attach_status': u'attached', 'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], '_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}}


Which matches approximately the time the test script got the in-use status

2015-02-26 13:49:59,558 31513 DEBUG    [tempest_lib.common.rest_client] Request (AttachVolumeTestJSON:test_list_get_volume_attachments): 200 GET http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a 0.099s
        Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
            Body: None
        Response - Headers: {'status': '200', 'content-length': '748', 'content-location': 'http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'x-compute-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955', 'connection': 'close', 'date': 'Thu, 26 Feb 2015 05:49:59 GMT', 'content-type': 'application/json', 'x-openstack-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955'}
            Body: {"volume": {"status": "in-use", "display_name": "test", "attachments": [{"host_name": null, "device": "/dev/vdb", "server_id": "9aeea261-30af-46eb-9fbb-53d03b65b43f", "id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "volume_id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a"}], "availability_zone": "nova", "bootable": "false", "encrypted": false, "created_at": "2015-02-26T05:49:32.000000", "os-vol-tenant-attr:tenant_id": "b49a40aa157e460592f3e3ac9a9f16eb", "os-volume-replication:driver_data": null, "display_description": null, "os-volume-replication:extended_status": null, "volume_type": null, "snapshot_id": null, "source_volid": null, "metadata": {"readonly": "False", "attached_mode": "rw"}, "id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "size": 1}}

After that the test script will call detach volume and we will face the error previously mentioned

Besides I also added one log at the very last line of the function attach in nova/virt/block_device.py in order to check the connection info

    @update_db
    def attach(self, context, instance, volume_api, virt_driver,
               do_check_attach=True, do_driver_attach=False):

>From Nova-cpu's log I can see that the log is displayed at

2015-02-26 13:50:02.003 AUDIT nova.virt.block_device [req-fd8124ca-d608-44a0-a1b9-7cb933fd840f AttachVolumeTestJSON-1822846528 AttachVolumeTestJSON-1226617609] test thelo 7 {u'driver_volume_type': u'disco', 'serial': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', u'data': {'device_path': u'/dev/dms999993877', u'access_mode': u'rw', u'disco_id': u'999993877', u'qos_specs': None, u'name': u'volume-bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a'}}

Which means that the detach_volume is called before attach_volume functions actually ends and potentially update the connection_info field in the database.

To confirm this logic, I added a "sleep" before we detach the volume in tempest test and run the test again. This time the test passed.

Does this logic seem correct? If yes, is that a bug and shouldn't we guarantee that the connection_info is saved into the database before we update the volume status?

Thanks for taking the time to read this,

Thelo

====================================================================
This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150226/875cde16/attachment.html>


More information about the OpenStack-dev mailing list