[Cinder] Cinder NetApp driver not working as expected on NVMe over TCP
Hello, We have NetApp C800 arrays for our OpenStack. We have set up the Cinder configuration to use the "NVMe over TCP" protocol. We are currently seeing the following issue: - the volumes (namespaces) are correctly mounted on the hypervisors - the multipath (native) configuration is correctly handled by Cinder (this can be seen in the logs) - the volumes are only attached by a single path Looking at the NetApp driver code, and more specifically at the *nvme_library.py* file and the *initialize_connection* function, we find on line 724: portal = (target_portals[0], self.NVME_PORT, self.NVME_TRANSPORT) data = { "target_nqn": str(target_nqn), "host_nqn": host_nqn, "portals": [portal], "vol_uuid": namespace_uuid } conn_info = {"driver_volume_type": "nvmeof", "data": data} If we look at the values of the target_portals[ ], the array does indeed have all the paths available for the targeted subsystem. However, the function only returns the first path : target_portals[0] I'm not a developer, so I can't guarantee that this is the cause of the problems encountered. What is certain is that manually connecting the namespace with the nvme command does indeed yield all four paths. I'll leave the existing discussion thread that led me to this observation open. Thank you. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Vincent Godin <vince.mlist@gmail.com> 29 juil. 2025 15:53 (il y a 3 jours) À Rajat, Sean, openstack-discuss Hello, Here are some of the results on the host. An instance is launched by Openstack on the compute nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- -------- /dev/nvme0n1 /dev/ng0n1 81O3QJXiLzBDAAAAAAAH NetApp ONTAP Controller 0x2 16.11 GB / 16.11 GB 4 KiB + 0 B FFFFFFFF if we have a look in the subsystem nvme list-subsys nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.ec2c63655c3d11f0a40ad039eaba99f2:subsystem.openstack-79f1de4a-6645-4b47-9377-f06db6c2e0b5 hostnqn=nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9 iopolicy=round-robin \ +- nvme0 tcp traddr=10.10.184.3,trsvcid=4420,src_addr=10.10.184.33 live I have only one path I disconnect the subsystem manually nvme disconnect -n nqn.1992-08.com.netapp:sn.ec2c63655c3d11f0a40ad039eaba99f2:subsystem.openstack-79f1de4a-6645-4b47-9377-f06db6c2e0b5 NQN:nqn.1992-08.com.netapp:sn.ec2c63655c3d11f0a40ad039eaba99f2:subsystem.openstack-79f1de4a-6645-4b47-9377-f06db6c2e0b5 disconnected 1 controller(s) I reconnect to the subsystem with a manual command nvme connect-all -t tcp -a 10.10.186.3 nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- -------- /dev/nvme0n2 /dev/ng0n2 81O3QJXiLzBDAAAAAAAH NetApp ONTAP Controller 0x2 16.11 GB / 16.11 GB 4 KiB + 0 B FFFFFFFF And if we look at the subsystem nvme list-subsys nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.ec2c63655c3d11f0a40ad039eaba99f2:subsystem.openstack-79f1de4a-6645-4b47-9377-f06db6c2e0b5 hostnqn=nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9 iopolicy=round-robin \ +- nvme5 tcp traddr=10.10.184.3,trsvcid=4420,src_addr=10.10.184.33 live +- nvme4 tcp traddr=10.10.186.3,trsvcid=4420,src_addr=10.10.186.33 live +- nvme3 tcp traddr=10.10.184.4,trsvcid=4420,src_addr=10.10.184.33 live +- nvme2 tcp traddr=10.10.186.4,trsvcid=4420,src_addr=10.10.186.33 live As you can see, i have four paths Configuration details about multipath : - in nova.conf [libvirt] volume_use_multipath = True - in cinder.conf [DEFAULT] target_protocol = nvmet_tcp ... [netapp-backend] use_multipath_for_image_xfer = True netapp_storage_protocol = nvme ... /sys/module/nvme_core/parameters/multipath cat /sys/module/nvme_core/parameters/multipath Y nova-compute.log grep -i get_connector_properties /var/log/kolla/nova/nova-compute.log 2025-07-29 14:09:51.553 7 DEBUG os_brick.initiator.connectors.lightos [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] LIGHTOS: finally hostnqn: nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9 dsc: get_connector_properties /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/initiator/connectors/lightos.py:115 2025-07-29 14:09:51.553 7 DEBUG os_brick.utils [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] <== get_connector_properties: return (30ms) {'platform': 'x86_64', 'os_type': 'linux', 'ip': '10.10.52.161', 'host': 'pkc-dcp-cpt-03', *'multipath': True, 'enforce_multipath': True*, 'initiator': 'iqn.2004-10.com.ubuntu:01:d0bb7aa9bcf1', 'do_local_attach': False, 'nvme_hostid': '5ca8b6d2-aa7d-42d8-bf74-c18484fab68c', 'system uuid': '31343550-3939-5a43-4a44-305930304c48', 'nqn': 'nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9', *'nvme_native_multipath': True*, 'found_dsc': '', 'host_ips': ['10.20.128.33', '10.10.184.33', '10.10.186.33', '10.10.52.161', '10.10.22.161', '10.234.2.161', '10.10.50.161', '172.17.0.1', 'fe80::7864:3eff:fe13:5e1f', 'fe80::fc16:3eff:fe7f:3430', 'fe80::4c20:48ff:fe0f:2660']} trace_logging_wrapper /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/utils.py:204 multipathd systemctl status multipathd.service ○ multipathd.service Loaded: masked (Reason: Unit multipathd.service is masked.) Active: inactive (dead) If you can see some reason to explain why openstack connect to the subsystem only with one path !!! Thanks ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Rajat Dhasmana 30 juil. 2025 12:37 (il y a 2 jours) À moi, Sean, openstack-discuss On Wed, Jul 30, 2025 at 3:15 PM Vincent Godin <vince.mlist@gmail.com> wrote:
Hello guys,
Some more informations found in the nova-compute.log :
-try iscsi
2025-07-29 14:09:51.523 1222 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): cat /etc/iscsi/initiatorname.iscsi execute /var/lib/kolla/venv/lib/python3.12/site-packages/oslo_concurrency/processutils.py:349 2025-07-29 14:09:51.528 1222 DEBUG oslo_concurrency.processutils [-] CMD "cat /etc/iscsi/initiatorname.iscsi" returned: 0 in 0.005s execute /var/lib/kolla/venv/lib/python3.12/site-packages/oslo_concurrency/processutils.py:372 2025-07-29 14:09:51.528 1222 DEBUG oslo.privsep.daemon [-] privsep: reply[90a51cdb-5701-4339-b059-fefb0b79b7a5]: (4, ('## DO NOT EDIT OR REMOVE THIS FILE!\n## If you remove this file, the iSCSI daemon will not start.\n## If you change the InitiatorName, existing access control lists\n## may reject this initiator. The InitiatorName must be unique\n## for each iSCSI initiator. Do NOT duplicate iSCSI InitiatorNames.\nInitiatorName=iqn.2004-10.com.ubuntu:01:d0bb7aa9bcf1\n', '')) _call_back /var/lib/kolla/venv/lib/python3.12/site-packages/oslo_privsep/daemon.py:503
-try lightos ???
2025-07-29 14:09:51.552 7 DEBUG os_brick.initiator.connectors.lightos [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] LIGHTOS: [Errno 111] ECONNREFUSED find_dsc /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/initiator/connectors/lightos.py:135 2025-07-29 14:09:51.553 7 INFO os_brick.initiator.connectors.lightos [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] Current host hostNQN nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9 and IP(s) are ['10.20.128.33', '10.10.184.33', '10.10.186.33', '10.10.52.161', '10.10.22.161', '10.234.2.161', '10.10.50.161', '172.17.0.1', 'fe80::7864:3eff:fe13:5e1f', 'fe80::fc16:3eff:fe7f:3430', 'fe80::4c20:48ff:fe0f:2660'] 2025-07-29 14:09:51.553 7 DEBUG os_brick.initiator.connectors.lightos [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] LIGHTOS: did not find dsc, continuing anyway. get_connector_properties /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/initiator/connectors/lightos.py:112 2025-07-29 14:09:51.553 7 DEBUG os_brick.initiator.connectors.lightos [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] LIGHTOS: finally hostnqn: nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9 dsc: get_connector_properties /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/initiator/connectors/lightos.py:115
-then
2025-07-29 14:09:51.553 7 DEBUG os_brick.utils [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] <== get_connector_properties: return (30ms) {'platform': 'x86_64', 'os_type': 'linux', 'ip': '10.10.52.161', 'host': 'pkc-dcp-cpt-03', 'multipath': True, 'enforce_multipath': True, 'initiator': 'iqn.2004-10.com.ubuntu:01:d0bb7aa9bcf1', 'do_local_attach': False, 'nvme_hostid': '5ca8b6d2-aa7d-42d8-bf74-c18484fab68c', 'system uuid': '31343550-3939-5a43-4a44-305930304c48', 'nqn': 'nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9', 'nvme_native_multipath': True, 'found_dsc': '', 'host_ips': ['10.20.128.33', '10.10.184.33', '10.10.186.33', '10.10.52.161', '10.10.22.161', '10.234.2.161', '10.10.50.161', '172.17.0.1', 'fe80::7864:3eff:fe13:5e1f', 'fe80::fc16:3eff:fe7f:3430', 'fe80::4c20:48ff:fe0f:2660']} trace_logging_wrapper /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/utils.py:204
'multipath': True, 'enforce_multipath': True This shows that multipath configuration is set correctly. It would be good to search for this log entry[1] in cinder-volume logs and see the *portals *field to verify how many portals does the netapp nvme driver returns *Initialize connection info:* https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/netapp...
2025-07-29 14:09:51.554 7 DEBUG nova.virt.block_device [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] [instance: 3fcb3e36-1890-44f7-9c3c-283c05e91910] Updating existing volume attachment record: b81aea6e-f2ae-4781-8c2e-3b7f1606ba0d _volume_attach /var/lib/kolla/venv/lib/python3.12/site-packages/nova/virt/block_device.py:666 2025-07-29 14:09:53.680 7 DEBUG os_brick.initiator.connectors.nvmeof [None req-faf2b0ca-0709-4a70-8302-fa90ad293fd3 4e2ddaf17ee747f2a1f03a392943f80a cb513debb0834ec5b6588356a960bad9 - - default default] ==> connect_volume: call "{'self': <os_brick.initiator.connectors.nvmeof.NVMeOFConnector object at 0x7cf65c576090>, 'connection_properties': {'target_nqn': 'nqn.1992-08.com.netapp:sn.ec2c63655c3d11f0a40ad039eaba99f2:subsystem.openstack-79f1de4a-6645-4b47-9377-f06db6c2e0b5', 'host_nqn': 'nqn.2014-08.org.nvmexpress:uuid:629788a4-04c6-547c-9121-8d7a39c17fe9', 'portals': [['10.10.184.3', 4420, 'tcp']], 'vol_uuid': '69da9918-7e84-4ee4-b7bb-9b50e3e6d739', 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False, 'cacheable': False, 'enforce_multipath': True}}" trace_logging_wrapper /var/lib/kolla/venv/lib/python3.12/site-packages/os_brick/utils.py:177
'portals': [['10.10.184.3', 4420, 'tcp']] Here we can see that there is only one portal returned by the netapp driver
participants (1)
-
Vincent Godin