[Openstack-operators] Recover stalled live-migration

David Medberry openstack at medberry.net
Wed Aug 13 11:02:35 UTC 2014


Hi guys,

What's the best way to recover a stalled live-migration?

Scenario:
Using shared storage (ceph & NFS in our case), we triggered some
live-migrations. During the l-m, all of the VMs stalled with the migration
never completing. In this case the qemu-kvm instances processes did migrate
to the new target however, the l-m could not complete. (The underlying
reason that l-m could not complete was due to a systemic neutron issue.)

I've typically used "nova reset-state --active" when an l-m doesn't finish,
but in this case, that kept the state of the l-m on the original node
though the qemu was on the target node. Is there any way to stitch them
back together after the fact (aside from just hacking the mysql database)?

Your thoughts and/or experience with this would be useful. (This occurred
in a testbed and ultimately we terminated the instances PRIOR to finding
out that it was neutron that was causing the lack-of-forward-progress.
Likely restarting neutron in this case would have allowed the l-ms to
complete.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140813/c4142798/attachment.html>


More information about the OpenStack-operators mailing list