Hi, For the record, the end of the story... I decided to reinstall the Magnum server and after that everything was fine. It seems something wrong happened during the Antelope -> Caracal migration, not clear what. At least, the fix was easy at the end. Best regards, Miche Le 13/06/2024 à 14:33, Michel Jouvin a écrit :
Hi,
In our Yoga -> Antelope upgrade, we have been able to live migrate active VMs from most Yoga HV to an Antelope one. In one case, the migration starts well but at the end of the disk migration, there is an "unexpected error" and the migration is canceled. I have not been able to find any more detailed information about the failure. In the source HV, nova-compute log says:
-------------------
2024-06-13 13:41:09.544 4108750 INFO nova.virt.libvirt.driver [req-565bd94e-9a3d-43b9-a42d-80ea6545c0a6 5c0b6777dafaad4f7a2e9d3b1959ff732468809f18ef35286eb123fb10865760 1b693dd6ab914fc0b5398f96e7866dd2 - 278da0eaa8cc4870a72e3a8abbf9a3f0 4828d76008524b798a4f5f632ba26adf] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Migration running for 2804 secs, memory 100% remaining (bytes processed=0, remaining=0, total=0); disk 1% remaining (bytes processed=106548297728, remaining=863371264, total=107411668992). 2024-06-13 13:41:20.436 4108750 INFO nova.compute.resource_tracker [req-db52ee9a-b0e4-4ac1-83f6-f57600ccc275 - - - - -] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Updating resource usage from migration 35e1a5e8-0f7d-4ff9-9075-c8da30649b4a 2024-06-13 13:41:45.678 4108750 INFO nova.virt.libvirt.driver [req-565bd94e-9a3d-43b9-a42d-80ea6545c0a6 5c0b6777dafaad4f7a2e9d3b1959ff732468809f18ef35286eb123fb10865760 1b693dd6ab914fc0b5398f96e7866dd2 - 278da0eaa8cc4870a72e3a8abbf9a3f0 4828d76008524b798a4f5f632ba26adf] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Migration running for 2840 secs, memory 4% remaining (bytes processed=13900749654, remaining=691298304, total=16782270464); disk 0% remaining (bytes processed=107429101568, remaining=262144, total=107429363712). 2024-06-13 13:41:51.153 4108750 INFO nova.compute.manager [req-115e45f4-d8c5-43d7-ae70-1608818eaf70 - - - - -] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] VM Paused (Lifecycle Event) 2024-06-13 13:41:51.212 4108750 INFO nova.compute.manager [req-115e45f4-d8c5-43d7-ae70-1608818eaf70 - - - - -] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] During sync_power_state the instance has a pending task (migrating). Skip. 2024-06-13 13:41:51.390 4108750 INFO nova.compute.manager [req-115e45f4-d8c5-43d7-ae70-1608818eaf70 - - - - -] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] VM Resumed (Lifecycle Event) 2024-06-13 13:41:51.426 4108750 INFO nova.compute.manager [req-115e45f4-d8c5-43d7-ae70-1608818eaf70 - - - - -] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] During sync_power_state the instance has a pending task (migrating). Skip. 2024-06-13 13:41:51.772 4108750 ERROR nova.virt.libvirt.driver [req-565bd94e-9a3d-43b9-a42d-80ea6545c0a6 5c0b6777dafaad4f7a2e9d3b1959ff732468809f18ef35286eb123fb10865760 1b693dd6ab914fc0b5398f96e7866dd2 - 278da0eaa8cc4870a72e3a8abbf9a3f0 4828d76008524b798a4f5f632ba26adf] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Migration operation has aborted 2024-06-13 13:41:51.774 4108750 ERROR nova.virt.libvirt.driver [req-565bd94e-9a3d-43b9-a42d-80ea6545c0a6 5c0b6777dafaad4f7a2e9d3b1959ff732468809f18ef35286eb123fb10865760 1b693dd6ab914fc0b5398f96e7866dd2 - 278da0eaa8cc4870a72e3a8abbf9a3f0 4828d76008524b798a4f5f632ba26adf] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Live Migration failure: internal error: QEMU unexpectedly closed the monitor (vm='instance-00042fe6'): 2024-06-13T10:54:25.319005Z qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated 2024-06-13 13:41:51.806 4108750 INFO nova.compute.manager [req-565bd94e-9a3d-43b9-a42d-80ea6545c0a6 5c0b6777dafaad4f7a2e9d3b1959ff732468809f18ef35286eb123fb10865760 1b693dd6ab914fc0b5398f96e7866dd2 - 278da0eaa8cc4870a72e3a8abbf9a3f0 4828d76008524b798a4f5f632ba26adf] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Swapping old allocation on dict_keys(['0a7be64c-cff9-4fa7-93ac-ddd0a965e93a']) held by migration 35e1a5e8-0f7d-4ff9-9075-c8da30649b4a for instance 2024-06-13 13:41:52.882 4108750 WARNING nova.compute.manager [req-8c267aa3-29dc-43ec-9a1b-9c4ada34dd3f f0d2557b870a44fa90c98b507629407e 5cf79c1092c44348addf9bee098f138e - default default] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Received unexpected event network-vif-unplugged-7446e7d3-a85d-4bd3-a609-decf5e7dd591 for instance with vm_state active and task_state None. 2024-06-13 13:41:55.758 4108750 WARNING nova.compute.manager [req-59c5fb1f-4223-4e14-90cf-c97bb0b9130a f0d2557b870a44fa90c98b507629407e 5cf79c1092c44348addf9bee098f138e - default default] [instance: 81d8ce0d-3120-45ad-b8c7-c219638fbbdd] Received unexpected event network-vif-plugged-7446e7d3-a85d-4bd3-a609-decf5e7dd591 for instance with vm_state active and task_state None. -----------------
It happens for all the VMs hosted by the source HV that have no link between them (not the same project) and whatever is the destination HV (same HW model) so I guess it has to do with the source HV. And there were no such problems when migrating VMs between the other HVs with the same HW model.
Any idea where to look for details that may help to identify the problem?
Best regards,
Michel