<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Hello 

<div class=""><br class="">

</div>

<div class="">We have a new Mitaka cloud in which we use fiber-channel storage (via EMC XtremIO) and Cinder.  Provisioning/deleting of instances works fine, but at the last stage of a live-migration operation the VM instance is left running on the new host

 with no Cinder volume due to failed paths.  I’m not certain how much of what I’m seeing is due to core Nova and Cinder functionality vs the specific Cinder driver for XtremIO and I would love some insight on that and the possible causes of what we’re seeing.

   We use this same combination on our current Kilo based cloud without issue.</div>

<div class=""><br class="">

</div>

<div class="">Here’s what happens:</div>

<div class="">

<div class="">

<div class=""><br class="">

</div>

<div class="">- create a VM booted from volume.  In this case, it ended up on openstack-compute04 and runs succesfully </div>

<div class=""><br class="">

</div>

<div class="">- Multipath status looks good:</div>

<div class="">[<a href="mailto:root@openstack-compute04.a.pc.ostk.com" class="">root@openstack-compute04</a>] # multipath -ll<br class="">

3514f0c5c0860003d dm-2 XtremIO ,XtremApp        <br class="">

size=20G features='0' hwhandler='0' wp=rw<br class="">

`-+- policy='queue-length 0' prio=1 status=active<br class="">

  |- 1:0:0:1  sdb 8:16 active ready running<br class="">

  |- 1:0:1:1  sdc 8:32 active ready running<br class="">

  |- 12:0:0:1 sdd 8:48 active ready running<br class="">

  `- 12:0:1:1 sde 8:64 active ready running<br class="">

<div class=""><br class="">

</div>

<div class="">- The /var/lib/scsi_id command (called by nova-rootwrap as we’ll see later) is able to determine scsi ids for these paths in /dev/disk/by-path:</div>

<div class="">[<a href="mailto:root@openstack-compute04.a.pc.ostk.com" class="">root@openstack-compute04.a.pc.ostk.com</a> by-path] # for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/$i; echo; done<br class="">

pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1<br class="">

3514f0c5c0860003d<br class="">

<br class="">

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1<br class="">

3514f0c5c0860003d<br class="">

<br class="">

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1<br class="">

3514f0c5c0860003d<br class="">

<br class="">

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1<br class="">

3514f0c5c0860003d</div>

<div class=""><br class="webkit-block-placeholder">

</div>

<div class="">- Now perform live-migration.  In this case, the instance moves to openstack-compute03:</div>

<div class="">[<a href="mailto:root@openstack-controller01.a.pc.ostk.com" class="">root@openstack-controller01</a>] # nova live-migration 13c82fa9-828c-4289-8bfc-e36e42f79388</div>

<div class=""><br class="">

</div>

<div class="">This fails.  The VM is left ‘running' on the new target host but has no disk because all the paths on the target host are failed.  They are properly removed from the original host.</div>

<div class="">[<a href="mailto:root@openstack-compute03.a.pc.ostk.com" class="">root@openstack-compute03</a>] # virsh list --all<br class="">

 Id    Name                           State<br class="">

----------------------------------------------------<br class="">

 1     instance-000000b5              running<br class="">

<br class="">

- Failed paths also confirmed by multipath output:</div>

<div class="">[<a href="mailto:root@openstack-compute03.a.pc.ostk.com" class="">root@openstack-compute03</a>] # multipath -ll<br class="">

3514f0c5c0860003d dm-2 XtremIO ,XtremApp        <br class="">

size=20G features='0' hwhandler='0' wp=rw<br class="">

`-+- policy='queue-length 0' prio=0 status=enabled<br class="">

  |- 1:0:0:1  sdb 8:16 failed faulty running<br class="">

  |- 1:0:1:1  sdc 8:32 failed faulty running<br class="">

  |- 12:0:0:1 sdd 8:48 failed faulty running<br class="">

  `- 12:0:1:1 sde 8:64 failed faulty running</div>

<div class=""><br class="">

</div>

<div class="">- The error in the nova-compute log of the target host (openstack-compute03 in this case) points to the call made by nova-rootwrap, which receives a bad exit code when trying to get the scsi_id:</div>

<div class=""><br class="">

</div>

<div class="">2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Command: sudo nova-rootwrap /etc/nova/rootwrap.conf scsi_id --page 0x83 --whitelisted /dev/disk/by-path/pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1<br class="">

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Exit code: 1<br class="">

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stdout: u''<br class="">

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stderr: u''<br class="">

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher</div>

<div class=""><br class="">

</div>

<div class="">...and in fact, running the scsi_id command directly (as was run previously on the original host) fails to return a scsi ID and returns with a non-successful “1” error code:</div>

<div class=""><br class="">

</div>

<div class="">[<a href="mailto:root@openstack-compute03.a.pc.ostk.com" class="">root@openstack-compute03</a>] # for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/$i; echo $?; echo; done<br class="">

pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1<br class="">

1<br class="">

<br class="">

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1<br class="">

1<br class="">

<br class="">

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1<br class="">

1<br class="">

<br class="">

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1<br class="">

1<br class="">

<br class="">

</div>

<div class="">My assumption is that Nova is expecting those storage paths to be fully functional at the time it tries to determine the SCSI IDs and it can’t because the paths are faulty.  I will be reaching out to EMC’s support for this of course, but I also

 would like to get the groups thoughts on this.  I believe the XIO Cinder driver is responsible for making sure the storage paths are properly presented, but I don’t fully understand the relationship between what Nova is doing and what the Cinder driver does.

  </div>

<div class=""><br class="">

</div>

</div>

<div class="">Any insight would be appreciated!</div>

</div>

<div class="">

<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<div class=""><br class="Apple-interchange-newline">

Mike Smith</div>

<div class="">Lead Cloud Systems Architect</div>

<div class=""><a href="http://overstock.com" class="">Overstock.com</a></div>

</div>

<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<br class="">

</div>

<br class="Apple-interchange-newline">

</div>

<br class="">

</div>

</body>

</html>