Karl,

 

Is there anything special about this instance, or its flavor? NUMA pinning, dedicated huge pages, other extra_specs? The fact that it’s getting the first ~256K of data transferred, but then nothing doesn’t sound like a post_migrate / mutating memory problem, or a network/transit problem between the hosts, but more like a claim problem on the remote host – or some resource is not participating in the migration in a healthy way.

 

There was a blueprint that highlighted how different NUMA topology or HugePages configurations could impact migrations: https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-aware-live-migration.html

 

Just some ideas that might help isolate where the problem might be:

  1. Assuming you’re using neutron networking, does neutron throw any errors (default migration in nova has “wait_for_vif_plug”, part of the migration process is that nova waits for neutron to confirm the “network-vif-plugged” event before starting the actual transfer of data ; this might be close to what you’re experiencing?)
  2. Does the instance have any PCI devices / PCI Passthrough device that might be not-live-migratable (or is marked as live_migratable:no or none)?
  3. Do offline migrations from the source host to the destination host work?
  4. Does a live-migration of a smaller instance from the source host to the destination host work?
  5. Does the destination host nova-compute or nova-api throw any errors or warnings when the migration is kicked off, complaining about claims, topology, or it’s ability to create the new instance?
  6. Have you customised your nova settings significantly from the defaults? Do either of the hosts have divergent configuration?

 

There are also some options you can tweak to up the detail of logging that QEMU generates, if you’ve taken all of the diagnostic steps above and gotten nowhere. QEMU has a analyze-migration.py script that you might be able to reverse engineer to get it to tell you more about what’s going on. https://www.qemu.org/docs/master/devel/migration/best-practices.html

 

Good luck – let us know how it turns out!

 

Kind Regards,

 

Joel McLean – Micron21 Pty Ltd

 

From: Karl Kloppenborg <kkloppenborg@resetdata.com.au>
Sent: Friday, 12 September 2025 7:07 PM
To: Kees Meijs | Nefos <keesm@nefos.com>; OpenStack Discuss <openstack-discuss@lists.openstack.org>
Subject: Re: Live-migration never completes memory copy

 

Hi Kees,

 

thanks for this, we did try this but still stuck at what appears to be “initial” memory copy 

 

Get Outlook for iOS


From: Kees Meijs | Nefos <keesm@nefos.com>
Sent: Friday, September 12, 2025 7:05:42 PM
To: Karl Kloppenborg <kkloppenborg@resetdata.com.au>; OpenStack Discuss <openstack-discuss@lists.openstack.org>
Subject: Re: Live-migration never completes memory copy

 

Hello Karl,

See https://docs.openstack.org/nova/latest/admin/configuring-migrations.html about this. Maybe auto-convergence could be helpful in your case.

Cheers,
Kees

__

Kees Meijs
BICT

Nefos Cloud & IT

Nefos IT bv
Burgemeester Mollaan 34a
5582 CK Waalre - NL
kvk 66494931

+31 (0)88 2088 188
nefos.com


The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended.

General terms and conditions ("The NLdigital Terms") apply to all our products and services.

On 12-09-2025 10:59, Karl Kloppenborg wrote:

Hi Openstack Teams,

 

We’re attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn’t work:

2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).

2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).

2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip.

 

 

2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).

2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).

2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).

 

Has anyone got insights for this issue?

 

Your help is greatly appreciated.

 

Thanks,

Karl.

 

Karl Kloppenborg

Chief Technology Officer

m: +61 437 239 565
resetdata.com

reset.png

ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting
For more information on the phasing of these requirements for business please visit;  
https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf

This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au