Have edited the subject for this subthread so as not to confuse with OP's query - hope that's helpful... On Thu, Nov 27, 2025 at 2:08 PM Sean Mooney <smooney@redhat.com> wrote:
On 27/11/2025 13:26, Nell Jerram wrote:
Could enable_qemu_monitor_announce_self blocking be responsible for 12 _minutes_ of delay? That sounds huge! i dont see any other way that that congi option coudl have an effect and be responceible for the repored issue.
it does not make sense that changing that value woudl actuly afffect this at all.
if that was a blockign call and it did nto return it may explain the delay otherwise my actual opinion is htis is a coincidence
Also, can I ask if this is _only_ a problem with the OpenStack status reporting (i.e. "openstack server migration list")? Or does it also affect the actual liveness of the migrated instance?
if its related to enable_qemu_monitor_announce_self it cant affect the livelyness of the insace and it woudl obly be a reporting issue.
i think this si much more likely ot be related to this feature request https://bugs.launchpad.net/nova/+bug/2128665
https://blueprints.launchpad.net/nova/+spec/refine-network-setup-procedure-i... and hte comemnt thread we dicssed
https://review.opendev.org/c/openstack/nova/+/966106/1/nova/virt/libvirt/hos...
the tldr is there is a kernel bug
https://lore.kernel.org/all/20240626191830.3819324-1-yang@os.amperecomputing... that is only fixed in 6.13 which can cause the souce vm to take minutes to stop as it is waith for the kernel to deaclocated the memory. we do not actully mark the live migration as complete until after that is complete.
so i think that why its taking mintues for the status to go to complete.
(Coincidentally, I am also currently investigating live migration. I'm seeing a problem where data transfer on an existing connection to the instance is held up for about 12 seconds after the migration has completed.)
im not sure but maybe that is related to the kernel bug? libvirt does have to do more then just tasnfor the data before it can compelte the migraton or unpause the vm on the dest but i dont knwo the detail well enough to say what that entails in detail.
Thanks Sean. To clarify/record a few details of my case: - I'm using the Calico Neutron driver, so any OVN details won't be relevant here. Calico currently "handles" live migration by deleting the route for the instance IP via the old node and creating a route to the instance IP via the new node, at the point where Neutron changes the port's "binding:host_id" to the new node. - Empirically, there's a window of about 1.5s between the old route disappearing and the new route appearing, on the relevant intermediate routers. During this window packets on the connection get retransmitted; the window doesn't cause the connection to drop. - Immediately after the window I see packets routed through to the instance (now on the new node) - but it then takes another 12 seconds before the instance starts responding to those. I think my next step is to research what the Neutron binding:host_id transition point corresponds to in Nova and libvirt terms, and then review if the situation correlates with the bug that you mentioned.