<div dir="ltr"><div>Many thanks.</div><div>Is there  mode to clean without rebooting ?</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno ven 30 ott 2020 alle ore 11:52 Gorka Eguileor <<a href="mailto:geguileo@redhat.com">geguileo@redhat.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 30/10, Ignazio Cassano wrote:<br>

> Please, se the last email where I upgraded to the last openstack nova on<br>

> queens.<br>

> [root@compute-0 nova]# rpm -qa|grep queens<br>

> centos-release-openstack-queens-1-2.el7.centos.noarch<br>

> [root@compute-0 nova]# rpm -qa|grep nova<br>

> openstack-nova-compute-17.0.13-1.el7.noarch<br>

> openstack-nova-common-17.0.13-1.el7.noarch<br>

> python-nova-17.0.13-1.el7.noarch<br>

> python2-novaclient-10.1.0-1.el7.noarch<br>

><br>

> I sent you the logs on the update release.<br>

> I am not so skilled for reading fine logs output about this issue.<br>

> Sorry<br>

> Ignazio<br>

><br>

<br>

Hi,<br>

<br>

I missed that email and the attachment, sorry.<br>

<br>

The logs you sent me were missing most of the connect_volume call, and<br>

only the end of the call was present, but I think it doesn't matter as I<br>

see what the problem is.<br>

<br>

The problem is that some of the nodes and sessions are duplicated.<br>

<br>

An example of a duplicated node:<br>

<br>

  tcp: [3] <a href="http://10.102.189.156:3260" rel="noreferrer" target="_blank">10.102.189.156:3260</a>,15 iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)<br>

  tcp: [4] <a href="http://10.102.189.156:3260" rel="noreferrer" target="_blank">10.102.189.156:3260</a>,15 iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)<br>

<br>

An example of that node's duplicated session:<br>

<br>

  tcp: [3] <a href="http://10.102.189.156:3260" rel="noreferrer" target="_blank">10.102.189.156:3260</a>,15 iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)<br>

  tcp: [4] <a href="http://10.102.189.156:3260" rel="noreferrer" target="_blank">10.102.189.156:3260</a>,15 iqn.1992-04.com.emc:cx.ckm00200502005.b11 (non-flash)<br>

<br>

And os-brick is not prepared to handle that, because it is programmed to<br>

reuse the nodes and sesssions.<br>

<br>

So on disconnect it get's the first of each to look for the volumes<br>

provided by it.  In the example of the duplicated node-session aboveit<br>

sees that it provides /dev/sdd, but that is not one of the disks that<br>

belong to the multipath that we are disconnecting, so it gets ignored.<br>

The volume we are looking for it's probably on the second session.<br>

<br>

So from this point forward (where we have duplicated node-sessions) it<br>

will not work again.<br>

<br>

I recommend you clean up that system so that you don't have duplicated<br>

nodes and sessions before trying to do a VM migration with a single<br>

volume attached.<br>

<br>

If that works, then try to attach 2 volumes on instances on the same<br>

host and see if the nodes and sessions are duplicated.<br>

<br>

Cheers,<br>

Gorka.<br>

<br>

<br>

> Il giorno ven 30 ott 2020 alle ore 09:27 Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> ha scritto:<br>

><br>

> > On 30/10, Ignazio Cassano wrote:<br>

> > > Hello, these are versions we are using:<br>

> > > [root@podto2-kvm02 ansible]# rpm -qa|grep queens<br>

> > > centos-release-openstack-queens-1-2.el7.centos.noarch<br>

> > > [root@podto2-kvm02 ansible]# rpm -qa|grep nova<br>

> > > openstack-nova-common-17.0.11-1.el7.noarch<br>

> > > python2-novaclient-10.1.0-1.el7.noarch<br>

> > > openstack-nova-compute-17.0.11-1.el7.noarch<br>

> > > python-nova-17.0.11-1.el7.noarch<br>

> > ><br>

> > > Cheers,Gorka.<br>

> ><br>

> > Hi,<br>

> ><br>

> > That release has the Nova bug fix, so Nova should not be calling Cinder<br>

> > to do an initialize connection on the source on the post-migration step<br>

> > anymore.<br>

> ><br>

> > I recommend comparing the connection info passed to connect_volume when<br>

> > the volume is attached on the source host and when it's disconnected on<br>

> > the post-migration step on the source host.<br>

> ><br>

> > Cheers,<br>

> > Gorka.<br>

> ><br>

> > ><br>

> > > ><br>

> > > > [1]: <a href="https://bugs.launchpad.net/nova/+bug/1754716" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1754716</a><br>

> > > > [2]:<br>

> > > ><br>

> > <a href="https://github.com/openstack/nova/commit/013f421bca4067bd430a9fac1e3b290cf1388ee4" rel="noreferrer" target="_blank">https://github.com/openstack/nova/commit/013f421bca4067bd430a9fac1e3b290cf1388ee4</a><br>

> > > ><br>

> > > > ><br>

> > > > ><br>

> > > > > Il giorno gio 29 ott 2020 alle ore 09:12 Gorka Eguileor <<br>

> > > > <a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> > > > > ha scritto:<br>

> > > > ><br>

> > > > > > On 28/10, Ignazio Cassano wrote:<br>

> > > > > > > Hello Gorka, I would like to know if with unity iscsi driver,  I<br>

> > must<br>

> > > > > > > configure iscsi initiator on both compute and controller nodes.<br>

> > > > > > > At this time I installed and cinfigured iscsi initiator only on<br>

> > > > compute<br>

> > > > > > > nodes and I got a lot of faulty devices when volumes ate<br>

> > detached.<br>

> > > > > > > Thanks<br>

> > > > > > > Ignazio<br>

> > > > > > ><br>

> > > > > ><br>

> > > > > > Hi,<br>

> > > > > ><br>

> > > > > > Both compute and controller nodes are in the data path.  Computes<br>

> > when<br>

> > > > > > instances use the volumes, and controllers when we create volume<br>

> > from<br>

> > > > > > images, do generic volume migrations, create or restore backups,<br>

> > etc.<br>

> > > > > ><br>

> > > > > > Unless your deployment isn't doing any of the Cinder operations<br>

> > that<br>

> > > > > > involve the data plane, you'll have to configure iSCSI on the<br>

> > > > controller<br>

> > > > > > as well.<br>

> > > > > ><br>

> > > > > > Having said that, whether you configure the iSCSI initiator or not<br>

> > on<br>

> > > > > > the controller will have no effect on the paths used by the<br>

> > compute.<br>

> > > > > ><br>

> > > > > > I've seen the iSCSI initiator going crazy when the iscsid and the<br>

> > > > > > iscsiadm are from different versions.  I've seen this in<br>

> > containerized<br>

> > > > > > environments.<br>

> > > > > ><br>

> > > > > > Faulty paths on multipathing is a tricky business, because there<br>

> > are<br>

> > > > > > different checkers, some generic (readsector0, tur, directio) and<br>

> > some<br>

> > > > > > vendor specific (emc_clarrion, hp_wd, rdac), and each one behaves<br>

> > in a<br>

> > > > > > different way.<br>

> > > > > ><br>

> > > > > > If you have a multipath device with faulty paths, that you think<br>

> > should<br>

> > > > > > not be faulty, you should look into what's going on with those<br>

> > paths:<br>

> > > > > ><br>

> > > > > > - Confirm that the device is still in the system under /dev/XYZ<br>

> > > > > > - Confirm in your storage array's console/webconsole that the<br>

> > volume is<br>

> > > > > >   still mapped on that target-portal to that host's iscsi initiator<br>

> > > > > >   name.<br>

> > > > > > - Confirm you can read the faulty devices with dd on the host<br>

> > > > > > - Confirm that the WWN of the device is the same in all the paths<br>

> > > > (using<br>

> > > > > >   /lib/udev/scsi_id)<br>

> > > > > > - Finally look into what checker is multipath using for your device<br>

> > > > > >   (sometimes checkers have bugs).<br>

> > > > > ><br>

> > > > > > Cheers,<br>

> > > > > > Gorka.<br>

> > > > > ><br>

> > > > > ><br>

> > > > > > ><br>

> > > > > > > Il Mar 20 Ott 2020, 19:58 Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> > ha<br>

> > > > > > scritto:<br>

> > > > > > ><br>

> > > > > > > > On 20/10, Ignazio Cassano wrote:<br>

> > > > > > > > > This is the entre log from when the migration started:<br>

> > > > > > > > ><br>

> > > > > > > > > <a href="http://paste.openstack.org/show/799199/" rel="noreferrer" target="_blank">http://paste.openstack.org/show/799199/</a><br>

> > > > > > > > ><br>

> > > > > > > > > Ignazio<br>

> > > > > > > ><br>

> > > > > > > > Hi,<br>

> > > > > > > ><br>

> > > > > > > > There are no os-brick calls in there. :-(<br>

> > > > > > > ><br>

> > > > > > > > You should look for the call to connect_volume that should have<br>

> > > > > > > > something like:<br>

> > > > > > > ><br>

> > > > > > > >   ==> disconnect_volume: call "{'args':<br>

> > > > > > > > (<os_brick.initiator.connectors.iscsi<br>

> > > > > > > ><br>

> > > > > > > > And the second parameter to that call is a dictionary where you<br>

> > > > can see<br>

> > > > > > > > the target_lun, target_luns, target_portals, target_portal,<br>

> > > > target_iqn,<br>

> > > > > > > > target_iqns...  This will allow us to check if we are actually<br>

> > > > > > connected<br>

> > > > > > > > to those targets-portals<br>

> > > > > > > ><br>

> > > > > > > > The third parameter should contain two things that are<br>

> > relevant,<br>

> > > > the<br>

> > > > > > > > scsi_wwn and the path.  You can check if the path exists and if<br>

> > > > that<br>

> > > > > > > > path actually has that wwn using /lib/udev/scsi_id --page 0x83<br>

> > > > > > > > --whitelisted $path<br>

> > > > > > > ><br>

> > > > > > > > Those are the things I would check, because the only reason I<br>

> > can<br>

> > > > think<br>

> > > > > > > > that os-brick is not disconnecting any volumes are that the<br>

> > > > connection<br>

> > > > > > > > info is not right, or that the volume is no longer connected.<br>

> > > > > > > ><br>

> > > > > > > > Cheers,<br>

> > > > > > > > Gorka.<br>

> > > > > > > ><br>

> > > > > > > > ><br>

> > > > > > > > > Il giorno mar 20 ott 2020 alle ore 11:23 Gorka Eguileor <<br>

> > > > > > > > <a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> > > > > > > > > ha scritto:<br>

> > > > > > > > ><br>

> > > > > > > > > > On 20/10, Ignazio Cassano wrote:<br>

> > > > > > > > > > > Hello Gorka,this is what happens on nova compute with<br>

> > debug<br>

> > > > > > enabled,<br>

> > > > > > > > > > when I<br>

> > > > > > > > > > > migrate an instance with iscsi volumes ( note<br>

> > Disconnecting<br>

> > > > > > from[]<br>

> > > > > > > > should<br>

> > > > > > > > > > > be the issue):<br>

> > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > > > > Hi,<br>

> > > > > > > > > ><br>

> > > > > > > > > > The disconnect from [] is the right clue, not necessarily<br>

> > the<br>

> > > > > > issue.<br>

> > > > > > > > > ><br>

> > > > > > > > > > OS-Brick is saying that for the connection information<br>

> > that has<br>

> > > > > > been<br>

> > > > > > > > > > passed in the "disconnect_volume" call (which is not<br>

> > present<br>

> > > > in the<br>

> > > > > > > > > > emailed logs) there are no volumes present in the system.<br>

> > > > > > > > > ><br>

> > > > > > > > > > You should check the connection info that Nova is passing<br>

> > to<br>

> > > > > > > > > > disconnect_volume and confirm if that data is correct.  For<br>

> > > > example<br>

> > > > > > > > > > checking if the path present in the connection info<br>

> > dictionary<br>

> > > > is<br>

> > > > > > the<br>

> > > > > > > > > > same as the one in the instance's XML dump, or if the LUN<br>

> > from<br>

> > > > the<br>

> > > > > > > > > > connection info dict is actually present in the system.<br>

> > > > > > > > > ><br>

> > > > > > > > > > There are multiple reasons why Nova could be passing the<br>

> > wrong<br>

> > > > > > > > > > connection info to os-brick.  The ones that come to mind<br>

> > are:<br>

> > > > > > > > > ><br>

> > > > > > > > > > - There was a failed migration at some point, and Nova<br>

> > didn't<br>

> > > > > > rollback<br>

> > > > > > > > > >   the connection info on the BDM table.<br>

> > > > > > > > > > - Nova is calling multiple times initialize_connection on<br>

> > > > Cinder<br>

> > > > > > for<br>

> > > > > > > > the<br>

> > > > > > > > > >   same host and the driver being used is not idempotent.<br>

> > > > > > > > > ><br>

> > > > > > > > > > Cheers,<br>

> > > > > > > > > > Gorka.<br>

> > > > > > > > > ><br>

> > > > > > > > > > > stderr= _run_iscsiadm_bare<br>

> > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > ><br>

> > > > > ><br>

> > > ><br>

> > /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1122<br>

> > > > > > > > > > > 2020-10-20 09:52:33.066 132171 DEBUG<br>

> > > > > > > > os_brick.initiator.connectors.iscsi<br>

> > > > > > > > > > > [-] iscsi session list stdout=tcp: [10]<br>

> > <a href="http://10.138.209.48:3260" rel="noreferrer" target="_blank">10.138.209.48:3260</a>,9<br>

> > > > > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a3 (non-flash)<br>

> > > > > > > > > > > tcp: [11] <a href="http://10.138.215.17:3260" rel="noreferrer" target="_blank">10.138.215.17:3260</a>,8<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a2<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [12] <a href="http://10.138.215.17:3260" rel="noreferrer" target="_blank">10.138.215.17:3260</a>,8<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a2<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [13] <a href="http://10.138.215.18:3260" rel="noreferrer" target="_blank">10.138.215.18:3260</a>,7<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b2<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [14] <a href="http://10.138.215.18:3260" rel="noreferrer" target="_blank">10.138.215.18:3260</a>,7<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b2<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [15] <a href="http://10.138.209.47:3260" rel="noreferrer" target="_blank">10.138.209.47:3260</a>,6<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b3<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [16] <a href="http://10.138.209.47:3260" rel="noreferrer" target="_blank">10.138.209.47:3260</a>,6<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.b3<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > > tcp: [9] <a href="http://10.138.209.48:3260" rel="noreferrer" target="_blank">10.138.209.48:3260</a>,9<br>

> > > > > > > > iqn.1992-04.com.emc:cx.ckm00184400687.a3<br>

> > > > > > > > > > > (non-flash)<br>

> > > > > > > > > > >  stderr= _run_iscsi_session<br>

> > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > ><br>

> > > > > ><br>

> > > ><br>

> > /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1111<br>

> > > > > > > > > > > 2020-10-20 09:52:33.078 132171 DEBUG<br>

> > > > > > > > os_brick.initiator.connectors.iscsi<br>

> > > > > > > > > > > [-] Resulting device map defaultdict(<function <lambda><br>

> > at<br>

> > > > > > > > > > 0x7f4f1b1f7cf8>,<br>

> > > > > > > > > > > {(u'<a href="http://10.138.215.17:3260" rel="noreferrer" target="_blank">10.138.215.17:3260</a>',<br>

> > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.a2'):<br>

> > > > > > > > > > > (set([]), set([u'sdg', u'sdi'])), (u'<a href="http://10.138.209.47:3260" rel="noreferrer" target="_blank">10.138.209.47:3260</a><br>

> > ',<br>

> > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.b3'): (set([]),<br>

> > > > > > set([u'sdo',<br>

> > > > > > > > > > > u'sdq'])), (u'<a href="http://10.138.209.48:3260" rel="noreferrer" target="_blank">10.138.209.48:3260</a>',<br>

> > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.a3'): (set([]),<br>

> > > > > > set([u'sdd',<br>

> > > > > > > > > > > u'sdb'])), (u'<a href="http://10.138.215.18:3260" rel="noreferrer" target="_blank">10.138.215.18:3260</a>',<br>

> > > > > > > > > > > u'iqn.1992-04.com.emc:cx.ckm00184400687.b2'): (set([]),<br>

> > > > > > set([u'sdm',<br>

> > > > > > > > > > > u'sdk']))}) _get_connection_devices<br>

> > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > ><br>

> > > > > ><br>

> > > ><br>

> > /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:844<br>

> > > > > > > > > > > 2020-10-20 09:52:33.078 132171 DEBUG<br>

> > > > > > > > os_brick.initiator.connectors.iscsi<br>

> > > > > > > > > > > [-] Disconnecting from: [] _disconnect_connection<br>

> > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > ><br>

> > > > > ><br>

> > > ><br>

> > /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1099<br>

> > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG<br>

> > > > oslo_concurrency.lockutils<br>

> > > > > > [-]<br>

> > > > > > > > Lock<br>

> > > > > > > > > > > "connect_volume" released by<br>

> > > > > > > > > > > "os_brick.initiator.connectors.iscsi.disconnect_volume"<br>

> > ::<br>

> > > > held<br>

> > > > > > > > 1.058s<br>

> > > > > > > > > > > inner<br>

> > > > > > > ><br>

> > /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:339<br>

> > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG<br>

> > > > > > > > os_brick.initiator.connectors.iscsi<br>

> > > > > > > > > > > [-] <== disconnect_volume: return (1057ms) None<br>

> > > > > > trace_logging_wrapper<br>

> > > > > > > > > > > /usr/lib/python2.7/site-packages/os_brick/utils.py:170<br>

> > > > > > > > > > > 2020-10-20 09:52:33.079 132171 DEBUG<br>

> > > > > > nova.virt.libvirt.volume.iscsi<br>

> > > > > > > > [-]<br>

> > > > > > > > > > > [instance: 0c846f66-f194-40de-b31e-d53652570fa7]<br>

> > Disconnected<br>

> > > > > > iSCSI<br>

> > > > > > > > > > Volume<br>

> > > > > > > > > > > disconnect_volume<br>

> > > > > > > > > > ><br>

> > > > > ><br>

> > /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py:78<br>

> > > > > > > > > > > 2020-10-20 09:52:33.080 132171 DEBUG os_brick.utils [-]<br>

> > ==><br>

> > > > > > > > > > > get_connector_properties: call u"{'execute': None,<br>

> > 'my_ip':<br>

> > > > > > > > > > > '10.138.208.178', 'enforce_multipath': True, 'host':<br>

> > > > > > > > 'podiscsivc-kvm02',<br>

> > > > > > > > > > > 'root_helper': 'sudo nova-rootwrap<br>

> > /etc/nova/rootwrap.conf',<br>

> > > > > > > > 'multipath':<br>

> > > > > > > > > > > True}" trace_logging_wrapper<br>

> > > > > > > > > > > /usr/lib/python2.7/site-packages/os_brick/utils.py:146<br>

> > > > > > > > > > > 2020-10-20 09:52:33.125 132171 DEBUG<br>

> > > > os_brick.initiator.linuxfc<br>

> > > > > > [-]<br>

> > > > > > > > No<br>

> > > > > > > > > > > Fibre Channel support detected on system. get_fc_hbas<br>

> > > > > > > > > > ><br>

> > > > > > /usr/lib/python2.7/site-packages/os_brick/initiator/linuxfc.py:157<br>

> > > > > > > > > > > 2020-10-20 09:52:33.126 132171 DEBUG<br>

> > > > os_brick.initiator.linuxfc<br>

> > > > > > [-]<br>

> > > > > > > > No<br>

> > > > > > > > > > > Fibre Channel support detected on system. get_fc_hbas<br>

> > > > > > > > > > ><br>

> > > > > > /usr/lib/python2.7/site-packages/os_brick/initiator/linuxfc.py:157<br>

> > > > > > > > > > > 2020-10-20 09:52:33.145 132171 DEBUG os_brick.utils [-]<br>

> > <==<br>

> > > > > > > > > > > get_connector_properties: return (61ms) {'initiator':<br>

> > > > > > > > > > > u'iqn.1994-05.com.redhat:fbfdc37eed4c', 'ip':<br>

> > > > u'10.138.208.178',<br>

> > > > > > > > 'system<br>

> > > > > > > > > > > uuid': u'4C4C4544-0051-4E10-8057-B6C04F425932',<br>

> > 'platform':<br>

> > > > > > > > u'x86_64',<br>

> > > > > > > > > > > 'host': u'podiscsivc-kvm02', 'do_local_attach': False,<br>

> > > > 'os_type':<br>

> > > > > > > > > > > u'linux2', 'multipath': True} trace_logging_wrapper<br>

> > > > > > > > > > > /usr/lib/python2.7/site-packages/os_brick/utils.py:170<br>

> > > > > > > > > > ><br>

> > > > > > > > > > ><br>

> > > > > > > > > > > Best regards<br>

> > > > > > > > > > > Ignazio<br>

> > > > > > > > > > ><br>

> > > > > > > > > > > Il giorno gio 15 ott 2020 alle ore 10:57 Gorka Eguileor <<br>

> > > > > > > > > > <a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> > > > > > > > > > > ha scritto:<br>

> > > > > > > > > > ><br>

> > > > > > > > > > > > On 14/10, Ignazio Cassano wrote:<br>

> > > > > > > > > > > > > Hello, thank you for the answer.<br>

> > > > > > > > > > > > > I am using os-brick 2.3.8 but I got same issues on<br>

> > stein<br>

> > > > with<br>

> > > > > > > > > > os.brick<br>

> > > > > > > > > > > > 2.8<br>

> > > > > > > > > > > > > For explain better the situation I send you the<br>

> > output of<br>

> > > > > > > > multipath<br>

> > > > > > > > > > -ll<br>

> > > > > > > > > > > > on<br>

> > > > > > > > > > > > > a compute node:<br>

> > > > > > > > > > > > > root@podvc-kvm01 ansible]# multipath -ll<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbg: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbe: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbd: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbf: alua not supported<br>

> > > > > > > > > > > > > 360060160f0d049007ab7275f743d0286 dm-11 DGC<br>

> >  ,VRAID<br>

> > > > > > > > > > > > > size=30G features='1 retain_attached_hw_handler'<br>

> > > > hwhandler='1<br>

> > > > > > > > alua'<br>

> > > > > > > > > > wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=0 status=enabled<br>

> > > > > > > > > > > > > | |- 15:0:0:71  sdbg 67:160 failed faulty running<br>

> > > > > > > > > > > > > | `- 12:0:0:71  sdbe 67:128 failed faulty running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=0 status=enabled<br>

> > > > > > > > > > > > >   |- 11:0:0:71  sdbd 67:112 failed faulty running<br>

> > > > > > > > > > > > >   `- 13:0:0:71  sdbf 67:144 failed faulty running<br>

> > > > > > > > > > > > > 360060160f0d049004cdb615f52343fdb dm-8 DGC     ,VRAID<br>

> > > > > > > > > > > > > size=80G features='2 queue_if_no_path<br>

> > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 15:0:0:210 sdau 66:224 active ready running<br>

> > > > > > > > > > > > > | `- 12:0:0:210 sdas 66:192 active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 11:0:0:210 sdar 66:176 active ready running<br>

> > > > > > > > > > > > >   `- 13:0:0:210 sdat 66:208 active ready running<br>

> > > > > > > > > > > > > 360060160f0d0490034aa645fe52265eb dm-12 DGC<br>

> >  ,VRAID<br>

> > > > > > > > > > > > > size=100G features='2 queue_if_no_path<br>

> > > > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 12:0:0:177 sdbi 67:192 active ready running<br>

> > > > > > > > > > > > > | `- 15:0:0:177 sdbk 67:224 active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 11:0:0:177 sdbh 67:176 active ready running<br>

> > > > > > > > > > > > >   `- 13:0:0:177 sdbj 67:208 active ready running<br>

> > > > > > > > > > > > > 360060160f0d04900159f225fd6126db9 dm-6 DGC     ,VRAID<br>

> > > > > > > > > > > > > size=40G features='2 queue_if_no_path<br>

> > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 11:0:0:26  sdaf 65:240 active ready running<br>

> > > > > > > > > > > > > | `- 13:0:0:26  sdah 66:16  active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 12:0:0:26  sdag 66:0   active ready running<br>

> > > > > > > > > > > > >   `- 15:0:0:26  sdai 66:32  active ready running<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdba: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbc: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdaz: alua not supported<br>

> > > > > > > > > > > > > Oct 14 18:50:01 | sdbb: alua not supported<br>

> > > > > > > > > > > > > 360060160f0d049007eb7275f93937511 dm-10 DGC<br>

> >  ,VRAID<br>

> > > > > > > > > > > > > size=40G features='1 retain_attached_hw_handler'<br>

> > > > hwhandler='1<br>

> > > > > > > > alua'<br>

> > > > > > > > > > wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=0 status=enabled<br>

> > > > > > > > > > > > > | |- 12:0:0:242 sdba 67:64  failed faulty running<br>

> > > > > > > > > > > > > | `- 15:0:0:242 sdbc 67:96  failed faulty running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=0 status=enabled<br>

> > > > > > > > > > > > >   |- 11:0:0:242 sdaz 67:48  failed faulty running<br>

> > > > > > > > > > > > >   `- 13:0:0:242 sdbb 67:80  failed faulty running<br>

> > > > > > > > > > > > > 360060160f0d049003a567c5fb72201e8 dm-7 DGC     ,VRAID<br>

> > > > > > > > > > > > > size=40G features='2 queue_if_no_path<br>

> > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 12:0:0:57  sdbq 68:64  active ready running<br>

> > > > > > > > > > > > > | `- 15:0:0:57  sdbs 68:96  active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 11:0:0:57  sdbp 68:48  active ready running<br>

> > > > > > > > > > > > >   `- 13:0:0:57  sdbr 68:80  active ready running<br>

> > > > > > > > > > > > > 360060160f0d04900c120625f802ea1fa dm-9 DGC     ,VRAID<br>

> > > > > > > > > > > > > size=25G features='2 queue_if_no_path<br>

> > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 11:0:0:234 sdav 66:240 active ready running<br>

> > > > > > > > > > > > > | `- 13:0:0:234 sdax 67:16  active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 15:0:0:234 sday 67:32  active ready running<br>

> > > > > > > > > > > > >   `- 12:0:0:234 sdaw 67:0   active ready running<br>

> > > > > > > > > > > > > 360060160f0d04900b8b0615fb14ef1bd dm-3 DGC     ,VRAID<br>

> > > > > > > > > > > > > size=50G features='2 queue_if_no_path<br>

> > > > > > retain_attached_hw_handler'<br>

> > > > > > > > > > > > > hwhandler='1 alua' wp=rw<br>

> > > > > > > > > > > > > |-+- policy='round-robin 0' prio=50 status=active<br>

> > > > > > > > > > > > > | |- 11:0:0:11  sdan 66:112 active ready running<br>

> > > > > > > > > > > > > | `- 13:0:0:11  sdap 66:144 active ready running<br>

> > > > > > > > > > > > > `-+- policy='round-robin 0' prio=10 status=enabled<br>

> > > > > > > > > > > > >   |- 12:0:0:11  sdao 66:128 active ready running<br>

> > > > > > > > > > > > >   `- 15:0:0:11  sdaq 66:160 active ready running<br>

> > > > > > > > > > > > ><br>

> > > > > > > > > > > > > The active running are related to running virtual<br>

> > > > machines.<br>

> > > > > > > > > > > > > The faulty are related to virtual macnines migrated<br>

> > on<br>

> > > > other<br>

> > > > > > kvm<br>

> > > > > > > > > > nodes.<br>

> > > > > > > > > > > > > Every volume has 4 path because iscsi on unity needs<br>

> > two<br>

> > > > > > > > different<br>

> > > > > > > > > > vlans,<br>

> > > > > > > > > > > > > each one with 2 addresses.<br>

> > > > > > > > > > > > > I think this issue can be related to os-brick because<br>

> > > > when I<br>

> > > > > > > > migrate<br>

> > > > > > > > > > a<br>

> > > > > > > > > > > > > virtual machine from host A host B in the cova<br>

> > compute<br>

> > > > log on<br>

> > > > > > > > host A<br>

> > > > > > > > > > I<br>

> > > > > > > > > > > > read:<br>

> > > > > > > > > > > > > 2020-10-13 10:31:02.769 118727 DEBUG<br>

> > > > > > > > > > os_brick.initiator.connectors.iscsi<br>

> > > > > > > > > > > > > [req-771ede8c-6e1b-4f3f-ad4a-1f6ed820a55c<br>

> > > > > > > > > > > > 66adb965bef64eaaab2af93ade87e2ca<br>

> > > > > > > > > > > > > 85cace94dcc7484c85ff9337eb1d0c4c - default default]<br>

> > > > > > > > *Disconnecting<br>

> > > > > > > > > > from:<br>

> > > > > > > > > > > > []*<br>

> > > > > > > > > > > > ><br>

> > > > > > > > > > > > > Ignazio<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > Hi,<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > That's definitely the right clue!!  Though I don't<br>

> > fully<br>

> > > > agree<br>

> > > > > > with<br>

> > > > > > > > > > this<br>

> > > > > > > > > > > > being an os-brick issue just yet.  ;-)<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > Like I mentioned before, RCA is usually non-trivial,<br>

> > and<br>

> > > > > > > > explaining how<br>

> > > > > > > > > > > > to debug these issues over email is close to<br>

> > impossible,<br>

> > > > but if<br>

> > > > > > > > this<br>

> > > > > > > > > > > > were my system, and assuming you have tested normal<br>

> > > > > > attach/detach<br>

> > > > > > > > > > > > procedure and is working fine, this is what I would do:<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > - Enable DEBUG logs on Nova compute node (I believe you<br>

> > > > already<br>

> > > > > > > > have)<br>

> > > > > > > > > > > > - Attach a new device to an instance on that node with<br>

> > > > --debug<br>

> > > > > > to<br>

> > > > > > > > get<br>

> > > > > > > > > > > >   the request id<br>

> > > > > > > > > > > > - Get the connection information dictionary that<br>

> > os-brick<br>

> > > > > > receives<br>

> > > > > > > > on<br>

> > > > > > > > > > > >   the call to connect_volume for that request, and the<br>

> > data<br>

> > > > > > that<br>

> > > > > > > > > > > >   os-brick returns to Nova on that method call<br>

> > completion.<br>

> > > > > > > > > > > > - Check if the returned data to Nova is a multipathed<br>

> > > > device or<br>

> > > > > > > > not (in<br>

> > > > > > > > > > > >   'path'), and whether we have the wwn or not (in<br>

> > > > > > 'scsi_wwn').  It<br>

> > > > > > > > > > > >   should be a multipath device, and then I would check<br>

> > its<br>

> > > > > > status<br>

> > > > > > > > in<br>

> > > > > > > > > > the<br>

> > > > > > > > > > > >   multipath daemon.<br>

> > > > > > > > > > > > - Now do the live migration (with --debug to get the<br>

> > > > request<br>

> > > > > > id)<br>

> > > > > > > > and<br>

> > > > > > > > > > see<br>

> > > > > > > > > > > >   what information Nova passes in that request to<br>

> > > > os-brick's<br>

> > > > > > > > > > > >   disconnect_volume.<br>

> > > > > > > > > > > >   - Is it the same? Then it's likely an os-brick issue,<br>

> > > > and I<br>

> > > > > > can<br>

> > > > > > > > have<br>

> > > > > > > > > > a<br>

> > > > > > > > > > > >     look at the logs if you put the logs for that<br>

> > os-brick<br>

> > > > > > detach<br>

> > > > > > > > > > > >     process in a pastebin [1].<br>

> > > > > > > > > > > >   - Is it different? Then it's either a Nova bug or a<br>

> > > > Cinder<br>

> > > > > > driver<br>

> > > > > > > > > > > >     specific bug.<br>

> > > > > > > > > > > >     - Is there a call from Nova to Cinder, in the<br>

> > migration<br>

> > > > > > > > request,<br>

> > > > > > > > > > for<br>

> > > > > > > > > > > >       that same volume to initialize_connection<br>

> > passing the<br>

> > > > > > source<br>

> > > > > > > > host<br>

> > > > > > > > > > > >       connector info (info from the host that is<br>

> > currently<br>

> > > > > > > > attached)?<br>

> > > > > > > > > > > >       If there is a call, check if the returned data is<br>

> > > > > > different<br>

> > > > > > > > from<br>

> > > > > > > > > > > >       the one we used to do the attach, if that's the<br>

> > case<br>

> > > > then<br>

> > > > > > > > it's a<br>

> > > > > > > > > > > >       Nova and Cinder driver bug that was solved on the<br>

> > > > Nova<br>

> > > > > > side<br>

> > > > > > > > in<br>

> > > > > > > > > > > >       17.0.10 [2].<br>

> > > > > > > > > > > >     - If there's no call to Cinder's<br>

> > > > initialize_connection, the<br>

> > > > > > > > it's<br>

> > > > > > > > > > > >       most likely a Nova bug. Try to find out if this<br>

> > > > > > connection<br>

> > > > > > > > info<br>

> > > > > > > > > > > >       makes any sense for that host (LUN, target,<br>

> > etc.) or<br>

> > > > if<br>

> > > > > > this<br>

> > > > > > > > is<br>

> > > > > > > > > > > >       the one from the destination volume.<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > I hope this somehow helps.<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > Cheers,<br>

> > > > > > > > > > > > Gorka.<br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > [1]: <a href="http://paste.openstack.org/" rel="noreferrer" target="_blank">http://paste.openstack.org/</a><br>

> > > > > > > > > > > > [2]: <a href="https://review.opendev.org/#/c/637827/" rel="noreferrer" target="_blank">https://review.opendev.org/#/c/637827/</a><br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > > ><br>

> > > > > > > > > > > > > Il giorno mer 14 ott 2020 alle ore 13:41 Gorka<br>

> > Eguileor <<br>

> > > > > > > > > > > > <a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>><br>

> > > > > > > > > > > > > ha scritto:<br>

> > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > On 09/10, Ignazio Cassano wrote:<br>

> > > > > > > > > > > > > > > Hello Stackers, I am using dell emc iscsi driver<br>

> > on<br>

> > > > my<br>

> > > > > > > > centos 7<br>

> > > > > > > > > > > > queens<br>

> > > > > > > > > > > > > > > openstack. It works and instances work as well<br>

> > but on<br>

> > > > > > compute<br>

> > > > > > > > > > nodes I<br>

> > > > > > > > > > > > > > got a<br>

> > > > > > > > > > > > > > > lot a faulty device reported by multipath il<br>

> > comand.<br>

> > > > > > > > > > > > > > > I do know why this happens, probably attacching<br>

> > and<br>

> > > > > > detaching<br>

> > > > > > > > > > > > volumes and<br>

> > > > > > > > > > > > > > > live migrating instances do not close something<br>

> > well.<br>

> > > > > > > > > > > > > > > I read this can cause serious performances<br>

> > problems<br>

> > > > on<br>

> > > > > > > > compute<br>

> > > > > > > > > > nodes.<br>

> > > > > > > > > > > > > > > Please, any workaround and/or patch is suggested<br>

> > ?<br>

> > > > > > > > > > > > > > > Regards<br>

> > > > > > > > > > > > > > > Ignazio<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > Hi,<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > There are many, many, many things that could be<br>

> > > > happening<br>

> > > > > > > > there,<br>

> > > > > > > > > > and<br>

> > > > > > > > > > > > > > it's not usually trivial doing the RCA, so the<br>

> > > > following<br>

> > > > > > > > questions<br>

> > > > > > > > > > are<br>

> > > > > > > > > > > > > > just me hoping this is something "easy" to find<br>

> > out.<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > What os-brick version from Queens are you running?<br>

> > > > Latest<br>

> > > > > > > > > > (2.3.9), or<br>

> > > > > > > > > > > > > > maybe one older than 2.3.3?<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > When you say you have faulty devices reported, are<br>

> > > > these<br>

> > > > > > faulty<br>

> > > > > > > > > > devices<br>

> > > > > > > > > > > > > > alone in the multipath DM? Or do you have some<br>

> > faulty<br>

> > > > ones<br>

> > > > > > with<br>

> > > > > > > > > > some<br>

> > > > > > > > > > > > > > that are ok?<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > If there are some OK and some that aren't, are they<br>

> > > > > > consecutive<br>

> > > > > > > > > > > > devices?<br>

> > > > > > > > > > > > > > (as in /dev/sda /dev/sdb etc).<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > > Cheers,<br>

> > > > > > > > > > > > > > Gorka.<br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > > > ><br>

> > > > > > > > > > > ><br>

> > > > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > > > ><br>

> > > > > > > ><br>

> > > > > > > ><br>

> > > > > ><br>

> > > > > ><br>

> > > ><br>

> > > ><br>

> ><br>

> ><br>

<br>

</blockquote></div>