<div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">Hi guys,</span><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">What's the best way to recover a stalled live-migration?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Scenario:</div><div style="font-family:arial,sans-serif;font-size:13px">Using shared storage (ceph & NFS in our case), we triggered some live-migrations. During the l-m, all of the VMs stalled with the migration never completing. In this case the qemu-kvm instances processes did migrate to the new target however, the l-m could not complete. (The underlying reason that l-m could not complete was due to a systemic neutron issue.)</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">I've typically used "nova reset-state --active" when an l-m doesn't finish, but in this case, that kept the state of the l-m on the original node though the qemu was on the target node. Is there any way to stitch them back together after the fact (aside from just hacking the mysql database)?</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Your thoughts and/or experience with this would be useful. (This occurred in a testbed and ultimately we terminated the instances PRIOR to finding out that it was neutron that was causing the lack-of-forward-progress. Likely restarting neutron in this case would have allowed the l-ms to complete.)</div>
</div>