[Openstack-security] [Bug 1419577] Re: when live-migrate failed, lun-id couldn't be rollback in havana

Matt Riedemann mriedem at us.ibm.com
Fri May 29 14:16:23 UTC 2015


I'm trying to sort this out a bit.

Looking at the nova.virt.libvirt.driver.pre_live_migration() method, I
see it's connecting to a volume and the connection_info dictionary is
updated in the nova.virt.libvirt.volume code, but I don't see where that
connection_info dict comes back to the virt driver's pre_live_migration
method and persists the change to the database.

This is where pre_live_migration() connects the volume:

http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?id=2015.1.0#n5813

Let's assume we're using the LibvirtISCSIVolumeDriver volume driver, the
connect_volume method in there will update the connection_info dict
here:

http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/volume.py?id=2015.1.0#n483

That change never gets persisted back to the block_device_mapping table
for the bdm instance, but we've connected the volume potentially on
another host so if live migration fails and we never rollback the volume
connection_info to the source host (before pre_live_migration), and
reboot the instance, then the bdm will be recreated from what's in the
database which will be wrong.

-- 
You received this bug notification because you are a member of OpenStack
Security, which is subscribed to OpenStack.
https://bugs.launchpad.net/bugs/1419577

Title:
  when live-migrate failed, lun-id couldn't be rollback in havana

Status in OpenStack Compute (Nova):
  Confirmed
Status in OpenStack Security Advisories:
  Won't Fix

Bug description:
  Hi, guys

  When live-migrate failed with error, lun-id of connection_info column in Nova's block_deivce_mapping table couldn't be rollback.
  and failed VM can have others volume.

  my test environment is following :

  Openstack Version : Havana ( 2013.2.3)
  Compute Node OS : 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  Compute Node multipath : multipath-tools 0.4.9-3ubuntu7.2

  test step is :

  1) create 2 Compute node (host#1 and host#2)
  2) create 1 VM on host#1 (vm01)
  3) create 1 cinder volume (vol01)
  4) attach 1 volume to vm01 (/dev/vdb)
  5) live-migrate vm01 from host#1 to host#2
  6) live-migrate success
       - please check the mapper by using multipath command in host#1 (# multipath -ll), then you can find mapper is not deleted.
         and the status of devices is "failed faulty"
       - please check the lun-id of vol01
  7) Again, live-migrate vm01 from host#2 to host#1 (vm01 was migrated to host#2 at step 4)
  8) live-migrate fail
       - please check the mapper in host#1
       - please check the lun-id of vol01, then you can find the lun hav "two" igroups
       - please check the connection_info column in Nova's block_deivce_mapping table, then you can find lun-id couldn't be rollback

  This Bug is critical security issue because the failed VM can have
  others volume.

  and every backend storage of cinder-volume can have same problem
  because this is the bug of live-migration's rollback process.

  I suggest below methods to solve issue :

  1) when live-migrate is complete, nova should delete mapper devices at origin host
  2) when live-migrate is failed, nova should rollback lun-id in connection_info column
  3) when live-migrate is failed, cinder should delete the mapping between lun and host (Netapp : igroup, EMC : storage_group ...)
  4) when volume-attach is requested , cinder volume driver of vendors should make lun-id randomly for reduce of probability of mis-mapping

  please check this bug.

  Thank you.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1419577/+subscriptions




More information about the Openstack-security mailing list