[Openstack][cinder] dell unity iscsi faulty devices

Ignazio Cassano ignaziocassano at gmail.com
Fri Oct 30 11:44:48 UTC 2020


Many thanks.
Is there  mode to clean without rebooting ?


Il giorno ven 30 ott 2020 alle ore 11:52 Gorka Eguileor <geguileo at redhat.com>
ha scritto:

> On 30/10, Ignazio Cassano wrote:
> > Please, se the last email where I upgraded to the last openstack nova on
> > queens.
> > [root at compute-0 nova]# rpm -qa|grep queens
> > centos-release-openstack-queens-1-2.el7.centos.noarch
> > [root at compute-0 nova]# rpm -qa|grep nova
> > openstack-nova-compute-17.0.13-1.el7.noarch
> > openstack-nova-common-17.0.13-1.el7.noarch
> > python-nova-17.0.13-1.el7.noarch
> > python2-novaclient-10.1.0-1.el7.noarch
> >
> > I sent you the logs on the update release.
> > I am not so skilled for reading fine logs output about this issue.
> > Sorry
> > Ignazio
> >
>
> Hi,
>
> I missed that email and the attachment, sorry.
>
> The logs you sent me were missing most of the connect_volume call, and
> only the end of the call was present, but I think it doesn't matter as I
> see what the problem is.
>
> The problem is that some of the nodes and sessions are duplicated.
>
> An example of a duplicated node:
>
>   tcp: [3] 10.102.189.156:3260,15
> iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)
>   tcp: [4] 10.102.189.156:3260,15
> iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)
>
> An example of that node's duplicated session:
>
>   tcp: [3] 10.102.189.156:3260,15
> iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)
>   tcp: [4] 10.102.189.156:3260,15
> iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)
>
> And os-brick is not prepared to handle that, because it is programmed to
> reuse the nodes and sesssions.
>
> So on disconnect it get's the first of each to look for the volumes
> provided by it.  In the example of the duplicated node-session aboveit
> sees that it provides /dev/sdd, but that is not one of the disks that
> belong to the multipath that we are disconnecting, so it gets ignored.
> The volume we are looking for it's probably on the second session.
>
> So from this point forward (where we have duplicated node-sessions) it
> will not work again.
>
> I recommend you clean up that system so that you don't have duplicated
> nodes and sessions before trying to do a VM migration with a single
> volume attached.
>
> If that works, then try to attach 2 volumes on instances on the same
> host and see if the nodes and sessions are duplicated.
>
> Cheers,
> Gorka.
>
>
> > Il giorno ven 30 ott 2020 alle ore 09:27 Gorka Eguileor <
> geguileo at redhat.com>
> > ha scritto:
> >
> > > On 30/10, Ignazio Cassano wrote:
> > > > Hello, these are versions we are using:
> > > > [root at podto2-kvm02 ansible]# rpm -qa|grep queens
> > > > centos-release-openstack-queens-1-2.el7.centos.noarch
> > > > [root at podto2-kvm02 ansible]# rpm -qa|grep nova
> > > > openstack-nova-common-17.0.11-1.el7.noarch
> > > > python2-novaclient-10.1.0-1.el7.noarch
> > > > openstack-nova-compute-17.0.11-1.el7.noarch
> > > > python-nova-17.0.11-1.el7.noarch
> > > >
> > > > Cheers,Gorka.
> > >
> > > Hi,
> > >
> > > That release has the Nova bug fix, so Nova should not be calling Cinder
> > > to do an initialize connection on the source on the post-migration step
> > > anymore.
> > >
> > > I recommend comparing the connection info passed to connect_volume when
> > > the volume is attached on the source host and when it's disconnected on
> > > the post-migration step on the source host.
> > >
> > > Cheers,
> > > Gorka.
> > >
> > > >
> > > > >
> > > > > [1]: https://bugs.launchpad.net/nova/+bug/1754716
> > > > > [2]:
> > > > >
> > >
> https://github.com/openstack/nova/commit/013f421bca4067bd430a9fac1e3b290cf1388ee4
> > > > >
> > > > > >
> > > > > >
> > > > > > Il giorno gio 29 ott 2020 alle ore 09:12 Gorka Eguileor <
> > > > > geguileo at redhat.com>
> > > > > > ha scritto:
> > > > > >
> > > > > > > On 28/10, Ignazio Cassano wrote:
> > > > > > > > Hello Gorka, I would like to know if with unity iscsi
> driver,  I
> > > must
> > > > > > > > configure iscsi initiator on both compute and controller
> nodes.
> > > > > > > > At this time I installed and cinfigured iscsi initiator only
> on
> > > > > compute
> > > > > > > > nodes and I got a lot of faulty devices when volumes ate
> > > detached.
> > > > > > > > Thanks
> > > > > > > > Ignazio
> > > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Both compute and controller nodes are in the data path.
> Computes
> > > when
> > > > > > > instances use the volumes, and controllers when we create
> volume
> > > from
> > > > > > > images, do generic volume migrations, create or restore
> backups,
> > > etc.
> > > > > > >
> > > > > > > Unless your deployment isn't doing any of the Cinder operations
> > > that
> > > > > > > involve the data plane, you'll have to configure iSCSI on the
> > > > > controller
> > > > > > > as well.
> > > > > > >
> > > > > > > Having said that, whether you configure the iSCSI initiator or
> not
> > > on
> > > > > > > the controller will have no effect on the paths used by the
> > > compute.
> > > > > > >
> > > > > > > I've seen the iSCSI initiator going crazy when the iscsid and
> the
> > > > > > > iscsiadm are from different versions.  I've seen this in
> > > containerized
> > > > > > > environments.
> > > > > > >
> > > > > > > Faulty paths on multipathing is a tricky business, because
> there
> > > are
> > > > > > > different checkers, some generic (readsector0, tur, directio)
> and
> > > some
> > > > > > > vendor specific (emc_clarrion, hp_wd, rdac), and each one
> behaves
> > > in a
> > > > > > > different way.
> > > > > > >
> > > > > > > If you have a multipath device with faulty paths, that you
> think
> > > should
> > > > > > > not be faulty, you should look into what's going on with those
> > > paths:
> > > > > > >
> > > > > > > - Confirm that the device is still in the system under /dev/XYZ
> > > > > > > - Confirm in your storage array's console/webconsole that the
> > > volume is
> > > > > > >   still mapped on that target-portal to that host's iscsi
> initiator
> > > > > > >   name.
> > > > > > > - Confirm you can read the faulty devices with dd on the host
> > > > > > > - Confirm that the WWN of the device is the same in all the
> paths
> > > > > (using
> > > > > > >   /lib/udev/scsi_id)
> > > > > > > - Finally look into what checker is multipath using for your
> device
> > > > > > >   (sometimes checkers have bugs).
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gorka.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Il Mar 20 Ott 2020, 19:58 Gorka Eguileor <
> geguileo at redhat.com>
> > > ha
> > > > > > > scritto:
> > > > > > > >
> > > > > > > > > On 20/10, Ignazio Cassano wrote:
> > > > > > > > > > This is the entre log from when the migration started:
> > > > > > > > > >
> > > > > > > > > > http://paste.openstack.org/show/799199/
> > > > > > > > > >
> > > > > > > > > > Ignazio
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > There are no os-brick calls in there. :-(
> > > > > > > > >
> > > > > > > > > You should look for the call to connect_volume that should
> have
> > > > > > > > > something like:
> > > > > > > > >
> > > > > > > > >   ==> disconnect_volume: call "{'args':
> > > > > > > > > (<os_brick.initiator.connectors.iscsi
> > > > > > > > >
> > > > > > > > > And the second parameter to that call is a dictionary
> where you
> > > > > can see
> > > > > > > > > the target_lun, target_luns, target_portals, target_portal,
> > > > > target_iqn,
> > > > > > > > > target_iqns...  This will allow us to check if we are
> actually
> > > > > > > connected
> > > > > > > > > to those targets-portals
> > > > > > > > >
> > > > > > > > > The third parameter should contain two things that are
> > > relevant,
> > > > > the
> > > > > > > > > scsi_wwn and the path.  You can check if the path exists
> and if
> > > > > that
> > > > > > > > > path actually has that wwn using /lib/udev/scsi_id --page
> 0x83
> > > > > > > > > --whitelisted $path
> > > > > > > > >
> > > > > > > > > Those are the things I would check, because the only
> reason I
> > > can
> > > > > think
> > > > > > > > > that os-brick is not disconnecting any volumes are that the
> > > > > connection
> > > > > > > > > info is not right, or that the volume is no longer
> connected.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Gorka.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Il giorno mar 20 ott 2020 alle ore 11:23 Gorka Eguileor <
> > > > > > > > > geguileo at redhat.com>
> > > > > > > > > > ha scritto:
> > > > > > > > > >
> > > > > > > > > > > On 20/10, Ignazio Cassano wrote:
> > > > > > > > > > > > Hello Gorka,this is what happens on nova compute with
> > > debug
> > > > > > > enabled,
> > > > > > > > > > > when I
> > > > > > > > > > > > migrate an instance with iscsi volumes ( note
> > > Disconnecting
> > > > > > > from[]
> > > > > > > > > should
> > > > > > > > > > > > be the issue):
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > The disconnect from [] is the right clue, not
> necessarily
> > > the
> > > > > > > issue.
> > > > > > > > > > >
> > > > > > > > > > > OS-Brick is saying that for the connection information
> > > that has
> > > > > > > been
> > > > > > > > > > > passed in the "disconnect_volume" call (which is not
> > > present
> > > > > in the
> > > > > > > > > > > emailed logs) there are no volumes present in the
> system.
> > > > > > > > > > >
> > > > > > > > > > > You should check the connection info that Nova is
> passing
> > > to
> > > > > > > > > > > disconnect_volume and confirm if that data is
> correct.  For
> > > > > example
> > > > > > > > > > > checking if the path present in the connection info
> > > dictionary
> > > > > is
> > > > > > > the
> > > > > > > > > > > same as the one in the instance's XML dump, or if the
> LUN
> > > from
> > > > > the
> > > > > > > > > > > connection info dict is actually present in the system.
> > > > > > > > > > >
> > > > > > > > > > > There are multiple reasons why Nova could be passing
> the
> > > wrong
> > > > > > > > > > > connection info to os-brick.  The ones that come to
> mind
> > > are:
> > > > > > > > > > >
> > > > > > > > > > > - There was a failed migration at some point, and Nova
> > > didn't
> > > > > > > rollback
> > > > > > > > > > >   the connection info on the BDM table.
> > > > > > > > > > > - Nova is calling multiple times initialize_connection
> on
> > > > > Cinder
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > >   same host and the driver being used is not
> idempotent.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Gorka.
> > > > > > > > > > >
> > > > > > > > > > > > stderr= _run_iscsiadm_bare
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1122
> > > > > > > > > > > > 2020-10-20 09:52:33.066 132171 DEBUG
> > > > > > > > > os_brick.initiator.connectors.iscsi
> > > > > > > > > > > > [-] iscsi session list stdout=tcp: [10]
> > > 10.138.209.48:3260,9
> > > > > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a3 (non-flash)
> > > > > > > > > > > > tcp: [11] 10.138.215.17:3260,8
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a2
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [12] 10.138.215.17:3260,8
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a2
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [13] 10.138.215.18:3260,7
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b2
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [14] 10.138.215.18:3260,7
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b2
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [15] 10.138.209.47:3260,6
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b3
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [16] 10.138.209.47:3260,6
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b3
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > > tcp: [9] 10.138.209.48:3260,9
> > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a3
> > > > > > > > > > > > (non-flash)
> > > > > > > > > > > >  stderr= _run_iscsi_session
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1111
> > > > > > > > > > > > 2020-10-20 09:52:33.078 132171 DEBUG
> > > > > > > > > os_brick.initiator.connectors.iscsi
> > > > > > > > > > > > [-] Resulting device map defaultdict(<function
> <lambda>
> > > at
> > > > > > > > > > > 0x7f4f1b1f7cf8>,
> > > > > > > > > > > > {(u'10.138.215.17:3260',
> > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.a2'):
> > > > > > > > > > > > (set([]), set([u'sdg', u'sdi'])), (u'
> 10.138.209.47:3260
> > > ',
> > > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.b3'):
> (set([]),
> > > > > > > set([u'sdo',
> > > > > > > > > > > > u'sdq'])), (u'10.138.209.48:3260',
> > > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.a3'):
> (set([]),
> > > > > > > set([u'sdd',
> > > > > > > > > > > > u'sdb'])), (u'10.138.215.18:3260',
> > > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.b2'):
> (set([]),
> > > > > > > set([u'sdm',
> > > > > > > > > > > > u'sdk']))}) _get_connection_devices
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:844
> > > > > > > > > > > > 2020-10-20 09:52:33.078 132171 DEBUG
> > > > > > > > > os_brick.initiator.connectors.iscsi
> > > > > > > > > > > > [-] Disconnecting from: [] _disconnect_connection
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1099
> > > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG
> > > > > oslo_concurrency.lockutils
> > > > > > > [-]
> > > > > > > > > Lock
> > > > > > > > > > > > "connect_volume" released by
> > > > > > > > > > > >
> "os_brick.initiator.connectors.iscsi.disconnect_volume"
> > > ::
> > > > > held
> > > > > > > > > 1.058s
> > > > > > > > > > > > inner
> > > > > > > > >
> > > /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:339
> > > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG
> > > > > > > > > os_brick.initiator.connectors.iscsi
> > > > > > > > > > > > [-] <== disconnect_volume: return (1057ms) None
> > > > > > > trace_logging_wrapper
> > > > > > > > > > > >
> /usr/lib/python2.7/site-packages/os_brick/utils.py:170
> > > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG
> > > > > > > nova.virt.libvirt.volume.iscsi
> > > > > > > > > [-]
> > > > > > > > > > > > [instance: 0c846f66-f194-40de-b31e-d53652570fa7]
> > > Disconnected
> > > > > > > iSCSI
> > > > > > > > > > > Volume
> > > > > > > > > > > > disconnect_volume
> > > > > > > > > > > >
> > > > > > >
> > > /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py:78
> > > > > > > > > > > > 2020-10-20 09:52:33.080 132171 DEBUG os_brick.utils
> [-]
> > > ==>
> > > > > > > > > > > > get_connector_properties: call u"{'execute': None,
> > > 'my_ip':
> > > > > > > > > > > > '10.138.208.178', 'enforce_multipath': True, 'host':
> > > > > > > > > 'podiscsivc-kvm02',
> > > > > > > > > > > > 'root_helper': 'sudo nova-rootwrap
> > > /etc/nova/rootwrap.conf',
> > > > > > > > > 'multipath':
> > > > > > > > > > > > True}" trace_logging_wrapper
> > > > > > > > > > > >
> /usr/lib/python2.7/site-packages/os_brick/utils.py:146
> > > > > > > > > > > > 2020-10-20 09:52:33.125 132171 DEBUG
> > > > > os_brick.initiator.linuxfc
> > > > > > > [-]
> > > > > > > > > No
> > > > > > > > > > > > Fibre Channel support detected on system. get_fc_hbas
> > > > > > > > > > > >
> > > > > > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/linuxfc.py:157
> > > > > > > > > > > > 2020-10-20 09:52:33.126 132171 DEBUG
> > > > > os_brick.initiator.linuxfc
> > > > > > > [-]
> > > > > > > > > No
> > > > > > > > > > > > Fibre Channel support detected on system. get_fc_hbas
> > > > > > > > > > > >
> > > > > > >
> /usr/lib/python2.7/site-packages/os_brick/initiator/linuxfc.py:157
> > > > > > > > > > > > 2020-10-20 09:52:33.145 132171 DEBUG os_brick.utils
> [-]
> > > <==
> > > > > > > > > > > > get_connector_properties: return (61ms) {'initiator':
> > > > > > > > > > > > u'iqn.1994-05.com.redhat:fbfdc37eed4c', 'ip':
> > > > > u'10.138.208.178',
> > > > > > > > > 'system
> > > > > > > > > > > > uuid': u'4C4C4544-0051-4E10-8057-B6C04F425932',
> > > 'platform':
> > > > > > > > > u'x86_64',
> > > > > > > > > > > > 'host': u'podiscsivc-kvm02', 'do_local_attach':
> False,
> > > > > 'os_type':
> > > > > > > > > > > > u'linux2', 'multipath': True} trace_logging_wrapper
> > > > > > > > > > > >
> /usr/lib/python2.7/site-packages/os_brick/utils.py:170
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards
> > > > > > > > > > > > Ignazio
> > > > > > > > > > > >
> > > > > > > > > > > > Il giorno gio 15 ott 2020 alle ore 10:57 Gorka
> Eguileor <
> > > > > > > > > > > geguileo at redhat.com>
> > > > > > > > > > > > ha scritto:
> > > > > > > > > > > >
> > > > > > > > > > > > > On 14/10, Ignazio Cassano wrote:
> > > > > > > > > > > > > > Hello, thank you for the answer.
> > > > > > > > > > > > > > I am using os-brick 2.3.8 but I got same issues
> on
> > > stein
> > > > > with
> > > > > > > > > > > os.brick
> > > > > > > > > > > > > 2.8
> > > > > > > > > > > > > > For explain better the situation I send you the
> > > output of
> > > > > > > > > multipath
> > > > > > > > > > > -ll
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > a compute node:
> > > > > > > > > > > > > > root at podvc-kvm01 ansible]# multipath -ll
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbg: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbe: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbd: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbf: alua not supported
> > > > > > > > > > > > > > 360060160f0d049007ab7275f743d0286 dm-11 DGC
> > >  ,VRAID
> > > > > > > > > > > > > > size=30G features='1 retain_attached_hw_handler'
> > > > > hwhandler='1
> > > > > > > > > alua'
> > > > > > > > > > > wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=0 status=enabled
> > > > > > > > > > > > > > | |- 15:0:0:71  sdbg 67:160 failed faulty running
> > > > > > > > > > > > > > | `- 12:0:0:71  sdbe 67:128 failed faulty running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=0 status=enabled
> > > > > > > > > > > > > >   |- 11:0:0:71  sdbd 67:112 failed faulty running
> > > > > > > > > > > > > >   `- 13:0:0:71  sdbf 67:144 failed faulty running
> > > > > > > > > > > > > > 360060160f0d049004cdb615f52343fdb dm-8 DGC
>  ,VRAID
> > > > > > > > > > > > > > size=80G features='2 queue_if_no_path
> > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 15:0:0:210 sdau 66:224 active ready running
> > > > > > > > > > > > > > | `- 12:0:0:210 sdas 66:192 active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 11:0:0:210 sdar 66:176 active ready running
> > > > > > > > > > > > > >   `- 13:0:0:210 sdat 66:208 active ready running
> > > > > > > > > > > > > > 360060160f0d0490034aa645fe52265eb dm-12 DGC
> > >  ,VRAID
> > > > > > > > > > > > > > size=100G features='2 queue_if_no_path
> > > > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 12:0:0:177 sdbi 67:192 active ready running
> > > > > > > > > > > > > > | `- 15:0:0:177 sdbk 67:224 active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 11:0:0:177 sdbh 67:176 active ready running
> > > > > > > > > > > > > >   `- 13:0:0:177 sdbj 67:208 active ready running
> > > > > > > > > > > > > > 360060160f0d04900159f225fd6126db9 dm-6 DGC
>  ,VRAID
> > > > > > > > > > > > > > size=40G features='2 queue_if_no_path
> > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 11:0:0:26  sdaf 65:240 active ready running
> > > > > > > > > > > > > > | `- 13:0:0:26  sdah 66:16  active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 12:0:0:26  sdag 66:0   active ready running
> > > > > > > > > > > > > >   `- 15:0:0:26  sdai 66:32  active ready running
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdba: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbc: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdaz: alua not supported
> > > > > > > > > > > > > > Oct 14 18:50:01 | sdbb: alua not supported
> > > > > > > > > > > > > > 360060160f0d049007eb7275f93937511 dm-10 DGC
> > >  ,VRAID
> > > > > > > > > > > > > > size=40G features='1 retain_attached_hw_handler'
> > > > > hwhandler='1
> > > > > > > > > alua'
> > > > > > > > > > > wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=0 status=enabled
> > > > > > > > > > > > > > | |- 12:0:0:242 sdba 67:64  failed faulty running
> > > > > > > > > > > > > > | `- 15:0:0:242 sdbc 67:96  failed faulty running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=0 status=enabled
> > > > > > > > > > > > > >   |- 11:0:0:242 sdaz 67:48  failed faulty running
> > > > > > > > > > > > > >   `- 13:0:0:242 sdbb 67:80  failed faulty running
> > > > > > > > > > > > > > 360060160f0d049003a567c5fb72201e8 dm-7 DGC
>  ,VRAID
> > > > > > > > > > > > > > size=40G features='2 queue_if_no_path
> > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 12:0:0:57  sdbq 68:64  active ready running
> > > > > > > > > > > > > > | `- 15:0:0:57  sdbs 68:96  active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 11:0:0:57  sdbp 68:48  active ready running
> > > > > > > > > > > > > >   `- 13:0:0:57  sdbr 68:80  active ready running
> > > > > > > > > > > > > > 360060160f0d04900c120625f802ea1fa dm-9 DGC
>  ,VRAID
> > > > > > > > > > > > > > size=25G features='2 queue_if_no_path
> > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 11:0:0:234 sdav 66:240 active ready running
> > > > > > > > > > > > > > | `- 13:0:0:234 sdax 67:16  active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 15:0:0:234 sday 67:32  active ready running
> > > > > > > > > > > > > >   `- 12:0:0:234 sdaw 67:0   active ready running
> > > > > > > > > > > > > > 360060160f0d04900b8b0615fb14ef1bd dm-3 DGC
>  ,VRAID
> > > > > > > > > > > > > > size=50G features='2 queue_if_no_path
> > > > > > > retain_attached_hw_handler'
> > > > > > > > > > > > > > hwhandler='1 alua' wp=rw
> > > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active
> > > > > > > > > > > > > > | |- 11:0:0:11  sdan 66:112 active ready running
> > > > > > > > > > > > > > | `- 13:0:0:11  sdap 66:144 active ready running
> > > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10
> status=enabled
> > > > > > > > > > > > > >   |- 12:0:0:11  sdao 66:128 active ready running
> > > > > > > > > > > > > >   `- 15:0:0:11  sdaq 66:160 active ready running
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The active running are related to running virtual
> > > > > machines.
> > > > > > > > > > > > > > The faulty are related to virtual macnines
> migrated
> > > on
> > > > > other
> > > > > > > kvm
> > > > > > > > > > > nodes.
> > > > > > > > > > > > > > Every volume has 4 path because iscsi on unity
> needs
> > > two
> > > > > > > > > different
> > > > > > > > > > > vlans,
> > > > > > > > > > > > > > each one with 2 addresses.
> > > > > > > > > > > > > > I think this issue can be related to os-brick
> because
> > > > > when I
> > > > > > > > > migrate
> > > > > > > > > > > a
> > > > > > > > > > > > > > virtual machine from host A host B in the cova
> > > compute
> > > > > log on
> > > > > > > > > host A
> > > > > > > > > > > I
> > > > > > > > > > > > > read:
> > > > > > > > > > > > > > 2020-10-13 10:31:02.769 118727 DEBUG
> > > > > > > > > > > os_brick.initiator.connectors.iscsi
> > > > > > > > > > > > > > [req-771ede8c-6e1b-4f3f-ad4a-1f6ed820a55c
> > > > > > > > > > > > > 66adb965bef64eaaab2af93ade87e2ca
> > > > > > > > > > > > > > 85cace94dcc7484c85ff9337eb1d0c4c - default
> default]
> > > > > > > > > *Disconnecting
> > > > > > > > > > > from:
> > > > > > > > > > > > > []*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ignazio
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > That's definitely the right clue!!  Though I don't
> > > fully
> > > > > agree
> > > > > > > with
> > > > > > > > > > > this
> > > > > > > > > > > > > being an os-brick issue just yet.  ;-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Like I mentioned before, RCA is usually
> non-trivial,
> > > and
> > > > > > > > > explaining how
> > > > > > > > > > > > > to debug these issues over email is close to
> > > impossible,
> > > > > but if
> > > > > > > > > this
> > > > > > > > > > > > > were my system, and assuming you have tested normal
> > > > > > > attach/detach
> > > > > > > > > > > > > procedure and is working fine, this is what I
> would do:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Enable DEBUG logs on Nova compute node (I
> believe you
> > > > > already
> > > > > > > > > have)
> > > > > > > > > > > > > - Attach a new device to an instance on that node
> with
> > > > > --debug
> > > > > > > to
> > > > > > > > > get
> > > > > > > > > > > > >   the request id
> > > > > > > > > > > > > - Get the connection information dictionary that
> > > os-brick
> > > > > > > receives
> > > > > > > > > on
> > > > > > > > > > > > >   the call to connect_volume for that request, and
> the
> > > data
> > > > > > > that
> > > > > > > > > > > > >   os-brick returns to Nova on that method call
> > > completion.
> > > > > > > > > > > > > - Check if the returned data to Nova is a
> multipathed
> > > > > device or
> > > > > > > > > not (in
> > > > > > > > > > > > >   'path'), and whether we have the wwn or not (in
> > > > > > > 'scsi_wwn').  It
> > > > > > > > > > > > >   should be a multipath device, and then I would
> check
> > > its
> > > > > > > status
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > >   multipath daemon.
> > > > > > > > > > > > > - Now do the live migration (with --debug to get
> the
> > > > > request
> > > > > > > id)
> > > > > > > > > and
> > > > > > > > > > > see
> > > > > > > > > > > > >   what information Nova passes in that request to
> > > > > os-brick's
> > > > > > > > > > > > >   disconnect_volume.
> > > > > > > > > > > > >   - Is it the same? Then it's likely an os-brick
> issue,
> > > > > and I
> > > > > > > can
> > > > > > > > > have
> > > > > > > > > > > a
> > > > > > > > > > > > >     look at the logs if you put the logs for that
> > > os-brick
> > > > > > > detach
> > > > > > > > > > > > >     process in a pastebin [1].
> > > > > > > > > > > > >   - Is it different? Then it's either a Nova bug
> or a
> > > > > Cinder
> > > > > > > driver
> > > > > > > > > > > > >     specific bug.
> > > > > > > > > > > > >     - Is there a call from Nova to Cinder, in the
> > > migration
> > > > > > > > > request,
> > > > > > > > > > > for
> > > > > > > > > > > > >       that same volume to initialize_connection
> > > passing the
> > > > > > > source
> > > > > > > > > host
> > > > > > > > > > > > >       connector info (info from the host that is
> > > currently
> > > > > > > > > attached)?
> > > > > > > > > > > > >       If there is a call, check if the returned
> data is
> > > > > > > different
> > > > > > > > > from
> > > > > > > > > > > > >       the one we used to do the attach, if that's
> the
> > > case
> > > > > then
> > > > > > > > > it's a
> > > > > > > > > > > > >       Nova and Cinder driver bug that was solved
> on the
> > > > > Nova
> > > > > > > side
> > > > > > > > > in
> > > > > > > > > > > > >       17.0.10 [2].
> > > > > > > > > > > > >     - If there's no call to Cinder's
> > > > > initialize_connection, the
> > > > > > > > > it's
> > > > > > > > > > > > >       most likely a Nova bug. Try to find out if
> this
> > > > > > > connection
> > > > > > > > > info
> > > > > > > > > > > > >       makes any sense for that host (LUN, target,
> > > etc.) or
> > > > > if
> > > > > > > this
> > > > > > > > > is
> > > > > > > > > > > > >       the one from the destination volume.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I hope this somehow helps.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Gorka.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]: http://paste.openstack.org/
> > > > > > > > > > > > > [2]: https://review.opendev.org/#/c/637827/
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Il giorno mer 14 ott 2020 alle ore 13:41 Gorka
> > > Eguileor <
> > > > > > > > > > > > > geguileo at redhat.com>
> > > > > > > > > > > > > > ha scritto:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 09/10, Ignazio Cassano wrote:
> > > > > > > > > > > > > > > > Hello Stackers, I am using dell emc iscsi
> driver
> > > on
> > > > > my
> > > > > > > > > centos 7
> > > > > > > > > > > > > queens
> > > > > > > > > > > > > > > > openstack. It works and instances work as
> well
> > > but on
> > > > > > > compute
> > > > > > > > > > > nodes I
> > > > > > > > > > > > > > > got a
> > > > > > > > > > > > > > > > lot a faulty device reported by multipath il
> > > comand.
> > > > > > > > > > > > > > > > I do know why this happens, probably
> attacching
> > > and
> > > > > > > detaching
> > > > > > > > > > > > > volumes and
> > > > > > > > > > > > > > > > live migrating instances do not close
> something
> > > well.
> > > > > > > > > > > > > > > > I read this can cause serious performances
> > > problems
> > > > > on
> > > > > > > > > compute
> > > > > > > > > > > nodes.
> > > > > > > > > > > > > > > > Please, any workaround and/or patch is
> suggested
> > > ?
> > > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > > Ignazio
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > There are many, many, many things that could be
> > > > > happening
> > > > > > > > > there,
> > > > > > > > > > > and
> > > > > > > > > > > > > > > it's not usually trivial doing the RCA, so the
> > > > > following
> > > > > > > > > questions
> > > > > > > > > > > are
> > > > > > > > > > > > > > > just me hoping this is something "easy" to find
> > > out.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What os-brick version from Queens are you
> running?
> > > > > Latest
> > > > > > > > > > > (2.3.9), or
> > > > > > > > > > > > > > > maybe one older than 2.3.3?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > When you say you have faulty devices reported,
> are
> > > > > these
> > > > > > > faulty
> > > > > > > > > > > devices
> > > > > > > > > > > > > > > alone in the multipath DM? Or do you have some
> > > faulty
> > > > > ones
> > > > > > > with
> > > > > > > > > > > some
> > > > > > > > > > > > > > > that are ok?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If there are some OK and some that aren't, are
> they
> > > > > > > consecutive
> > > > > > > > > > > > > devices?
> > > > > > > > > > > > > > > (as in /dev/sda /dev/sdb etc).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Gorka.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201030/fd42c921/attachment-0001.html>


More information about the openstack-discuss mailing list