Hello, thank you for the answer.

I am using os-brick 2.3.8 but I got same issues on stein with os.brick 2.8

For explain better the situation I send you the output of multipath -ll on a compute node:

root@podvc-kvm01 ansible]# multipath -ll
Oct 14 18:50:01 | sdbg: alua not supported
Oct 14 18:50:01 | sdbe: alua not supported
Oct 14 18:50:01 | sdbd: alua not supported
Oct 14 18:50:01 | sdbf: alua not supported
360060160f0d049007ab7275f743d0286 dm-11 DGC ,VRAID
size=30G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 15:0:0:71 sdbg 67:160 failed faulty running
| `- 12:0:0:71 sdbe 67:128 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 11:0:0:71 sdbd 67:112 failed faulty running
`- 13:0:0:71 sdbf 67:144 failed faulty running
360060160f0d049004cdb615f52343fdb dm-8 DGC ,VRAID
size=80G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 15:0:0:210 sdau 66:224 active ready running
| `- 12:0:0:210 sdas 66:192 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 11:0:0:210 sdar 66:176 active ready running
`- 13:0:0:210 sdat 66:208 active ready running
360060160f0d0490034aa645fe52265eb dm-12 DGC ,VRAID
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 12:0:0:177 sdbi 67:192 active ready running
| `- 15:0:0:177 sdbk 67:224 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 11:0:0:177 sdbh 67:176 active ready running
`- 13:0:0:177 sdbj 67:208 active ready running
360060160f0d04900159f225fd6126db9 dm-6 DGC ,VRAID
size=40G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 11:0:0:26 sdaf 65:240 active ready running
| `- 13:0:0:26 sdah 66:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 12:0:0:26 sdag 66:0 active ready running
`- 15:0:0:26 sdai 66:32 active ready running
Oct 14 18:50:01 | sdba: alua not supported
Oct 14 18:50:01 | sdbc: alua not supported
Oct 14 18:50:01 | sdaz: alua not supported
Oct 14 18:50:01 | sdbb: alua not supported
360060160f0d049007eb7275f93937511 dm-10 DGC ,VRAID
size=40G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 12:0:0:242 sdba 67:64 failed faulty running
| `- 15:0:0:242 sdbc 67:96 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 11:0:0:242 sdaz 67:48 failed faulty running
`- 13:0:0:242 sdbb 67:80 failed faulty running
360060160f0d049003a567c5fb72201e8 dm-7 DGC ,VRAID
size=40G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 12:0:0:57 sdbq 68:64 active ready running
| `- 15:0:0:57 sdbs 68:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 11:0:0:57 sdbp 68:48 active ready running
`- 13:0:0:57 sdbr 68:80 active ready running
360060160f0d04900c120625f802ea1fa dm-9 DGC ,VRAID
size=25G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 11:0:0:234 sdav 66:240 active ready running
| `- 13:0:0:234 sdax 67:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 15:0:0:234 sday 67:32 active ready running
`- 12:0:0:234 sdaw 67:0 active ready running
360060160f0d04900b8b0615fb14ef1bd dm-3 DGC ,VRAID
size=50G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 11:0:0:11 sdan 66:112 active ready running
| `- 13:0:0:11 sdap 66:144 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 12:0:0:11 sdao 66:128 active ready running
`- 15:0:0:11 sdaq 66:160 active ready running

The active running are related to running virtual machines.

The faulty are related to virtual macnines migrated on other kvm nodes.

Every volume has 4 path because iscsi on unity needs two different vlans, each one with 2 addresses.

I think this issue can be related to os-brick because when I migrate a virtual machine from host A host B in the cova compute log on host A I read:

2020-10-13 10:31:02.769 118727 DEBUG os_brick.initiator.connectors.iscsi [req-771ede8c-6e1b-4f3f-ad4a-1f6ed820a55c 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] Disconnecting from: []

Ignazio

Il giorno mer 14 ott 2020 alle ore 13:41 Gorka Eguileor <geguileo@redhat.com> ha scritto:

On 09/10, Ignazio Cassano wrote:
> Hello Stackers, I am using dell emc iscsi driver on my centos 7 queens
> openstack. It works and instances work as well but on compute nodes I got a
> lot a faulty device reported by multipath il comand.
> I do know why this happens, probably attacching and detaching volumes and
> live migrating instances do not close something well.
> I read this can cause serious performances problems on compute nodes.
> Please, any workaround and/or patch is suggested ?
> Regards
> Ignazio

Hi,

There are many, many, many things that could be happening there, and
it's not usually trivial doing the RCA, so the following questions are
just me hoping this is something "easy" to find out.

What os-brick version from Queens are you running? Latest (2.3.9), or
maybe one older than 2.3.3?

When you say you have faulty devices reported, are these faulty devices
alone in the multipath DM? Or do you have some faulty ones with some
that are ok?

If there are some OK and some that aren't, are they consecutive devices?
(as in /dev/sda /dev/sdb etc).

Cheers,
Gorka.