Open Stack

Fri Mar 10 07:23:36 UTC 2017

Hello

We have a new Mitaka cloud in which we use fiber-channel storage (via EMC XtremIO) and Cinder.  Provisioning/deleting of instances works fine, but at the last stage of a live-migration operation the VM instance is left running on the new host with no Cinder volume due to failed paths.  I’m not certain how much of what I’m seeing is due to core Nova and Cinder functionality vs the specific Cinder driver for XtremIO and I would love some insight on that and the possible causes of what we’re seeing.   We use this same combination on our current Kilo based cloud without issue.

Here’s what happens:

- create a VM booted from volume.  In this case, it ended up on openstack-compute04 and runs succesfully

- Multipath status looks good:
[root at openstack-compute04<mailto:root at openstack-compute04.a.pc.ostk.com>] # multipath -ll
3514f0c5c0860003d dm-2 XtremIO ,XtremApp
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 1:0:0:1  sdb 8:16 active ready running
  |- 1:0:1:1  sdc 8:32 active ready running
  |- 12:0:0:1 sdd 8:48 active ready running
  `- 12:0:1:1 sde 8:64 active ready running

- The /var/lib/scsi_id command (called by nova-rootwrap as we’ll see later) is able to determine scsi ids for these paths in /dev/disk/by-path:
[root at openstack-compute04.a.pc.ostk.com<mailto:root at openstack-compute04.a.pc.ostk.com> by-path] # for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/$i; echo; done
pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
3514f0c5c0860003d

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1
3514f0c5c0860003d

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1
3514f0c5c0860003d

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1
3514f0c5c0860003d

- Now perform live-migration.  In this case, the instance moves to openstack-compute03:
[root at openstack-controller01<mailto:root at openstack-controller01.a.pc.ostk.com>] # nova live-migration 13c82fa9-828c-4289-8bfc-e36e42f79388

This fails.  The VM is left ‘running' on the new target host but has no disk because all the paths on the target host are failed.  They are properly removed from the original host.
[root at openstack-compute03<mailto:root at openstack-compute03.a.pc.ostk.com>] # virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     instance-000000b5              running

- Failed paths also confirmed by multipath output:
[root at openstack-compute03<mailto:root at openstack-compute03.a.pc.ostk.com>] # multipath -ll
3514f0c5c0860003d dm-2 XtremIO ,XtremApp
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
  |- 1:0:0:1  sdb 8:16 failed faulty running
  |- 1:0:1:1  sdc 8:32 failed faulty running
  |- 12:0:0:1 sdd 8:48 failed faulty running
  `- 12:0:1:1 sde 8:64 failed faulty running

- The error in the nova-compute log of the target host (openstack-compute03 in this case) points to the call made by nova-rootwrap, which receives a bad exit code when trying to get the scsi_id:

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Command: sudo nova-rootwrap /etc/nova/rootwrap.conf scsi_id --page 0x83 --whitelisted /dev/disk/by-path/pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Exit code: 1
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stderr: u''
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher

...and in fact, running the scsi_id command directly (as was run previously on the original host) fails to return a scsi ID and returns with a non-successful “1” error code:

[root at openstack-compute03<mailto:root at openstack-compute03.a.pc.ostk.com>] # for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/$i; echo $?; echo; done
pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
1

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1
1

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1
1

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1
1

My assumption is that Nova is expecting those storage paths to be fully functional at the time it tries to determine the SCSI IDs and it can’t because the paths are faulty.  I will be reaching out to EMC’s support for this of course, but I also would like to get the groups thoughts on this.  I believe the XIO Cinder driver is responsible for making sure the storage paths are properly presented, but I don’t fully understand the relationship between what Nova is doing and what the Cinder driver does.

Any insight would be appreciated!

Mike Smith
Lead Cloud Systems Architect
Overstock.com<http://overstock.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170310/a5d625ef/attachment.html>

Open Stack

[Openstack] [cinder][nova] Issue with live-migration on new Mitaka cloud using FC XIO storage

OpenStack

Community

Documentation

Branding & Legal