Live-migration never completes memory copy
Hi Openstack Teams, We’re attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn’t work: 2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip. 2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). Has anyone got insights for this issue? Your help is greatly appreciated. Thanks, Karl. Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [cid:reset_69557fc2-1d63-4932-b5fd-93bd4f39ca7b.png] ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf<https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf> This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au>
Hello Karl, See https://docs.openstack.org/nova/latest/admin/configuring-migrations.html about this. Maybe auto-convergence could be helpful in your case. Cheers, Kees __ Kees Meijs BICT Nefos Cloud & IT <https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188 <tel:+31882088188> nefos.com <https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 10:59, Karl Kloppenborg wrote:
Hi Openstack Teams,
We’re attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn’t work: 2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip.
2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).
Has anyone got insights for this issue?
Your help is greatly appreciated.
Thanks, Karl.
Karl Kloppenborg
Chief Technology Officer
m: _+61 437 239 565_ _resetdata.com <https://resetdata.com/>_
reset.png
*ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting *For more information on the phasing of these requirements for business please visit; _https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-stat... <https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf>_
This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to _contact@resetdata.com.au_
Hi Kees, thanks for this, we did try this but still stuck at what appears to be “initial” memory copy Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Kees Meijs | Nefos <keesm@nefos.com> Sent: Friday, September 12, 2025 7:05:42 PM To: Karl Kloppenborg <kkloppenborg@resetdata.com.au>; OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: Live-migration never completes memory copy Hello Karl, See https://docs.openstack.org/nova/latest/admin/configuring-migrations.html about this. Maybe auto-convergence could be helpful in your case. Cheers, Kees __ Kees Meijs BICT [cid:part1.Ou00rW0N.tinfWmgr@nefos.com]<https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188<tel:+31882088188> nefos.com<https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 10:59, Karl Kloppenborg wrote: Hi Openstack Teams, We’re attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn’t work: 2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip. 2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). Has anyone got insights for this issue? Your help is greatly appreciated. Thanks, Karl. Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [cid:part2.hIdG2J02.Z0YrqNKR@nefos.com] ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-stat... This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au>
Hi, How is firewalling (packet filtering) in between your Compute Nodes? The KVM/QEMU processes need to be able to connect to each other, to transfer the memory pages. Cheers, Kees __ Kees Meijs BICT Nefos Cloud & IT <https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188 <tel:+31882088188> nefos.com <https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 11:06, Karl Kloppenborg wrote:
thanks for this, we did try this but still stuck at what appears to be “initial” memory copy
These nodes are functioning on a flat layer 2 network. No firewalls and dual 100G with MTU 9k Get Outlook for iOS<https://aka.ms/o0ukef> Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [cid:reset_69557fc2-1d63-4932-b5fd-93bd4f39ca7b.png] ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf<https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf> This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au> ________________________________ From: Kees Meijs | Nefos <keesm@nefos.com> Sent: Friday, September 12, 2025 7:33:42 PM To: Karl Kloppenborg <kkloppenborg@resetdata.com.au>; OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: Live-migration never completes memory copy Hi, How is firewalling (packet filtering) in between your Compute Nodes? The KVM/QEMU processes need to be able to connect to each other, to transfer the memory pages. Cheers, Kees __ Kees Meijs BICT [cid:part1.L1rXaL9L.kImdMvZW@nefos.com]<https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188<tel:+31882088188> nefos.com<https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 11:06, Karl Kloppenborg wrote: thanks for this, we did try this but still stuck at what appears to be “initial” memory copy
Hi, Still, check iptables or nft for rules. K. __ Kees Meijs BICT Nefos Cloud & IT <https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188 <tel:+31882088188> nefos.com <https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 11:34, Karl Kloppenborg wrote:
These nodes are functioning on a flat layer 2 network. No firewalls and dual 100G with MTU 9k
Will do, will report back Get Outlook for iOS<https://aka.ms/o0ukef> Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [cid:reset_69557fc2-1d63-4932-b5fd-93bd4f39ca7b.png] ResetData supports Mandatory Client Related Financial Disclosures - Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf<https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf> This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au> ________________________________ From: Kees Meijs | Nefos <keesm@nefos.com> Sent: Friday, September 12, 2025 7:36:59 PM To: Karl Kloppenborg <kkloppenborg@resetdata.com.au>; OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: Live-migration never completes memory copy Hi, Still, check iptables or nft for rules. K. __ Kees Meijs BICT [cid:part1.vGVvWUYz.XITCpoZ0@nefos.com]<https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188<tel:+31882088188> nefos.com<https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 11:34, Karl Kloppenborg wrote: These nodes are functioning on a flat layer 2 network. No firewalls and dual 100G with MTU 9k
Karl, Is there anything special about this instance, or its flavor? NUMA pinning, dedicated huge pages, other extra_specs? The fact that it's getting the first ~256K of data transferred, but then nothing doesn't sound like a post_migrate / mutating memory problem, or a network/transit problem between the hosts, but more like a claim problem on the remote host - or some resource is not participating in the migration in a healthy way. There was a blueprint that highlighted how different NUMA topology or HugePages configurations could impact migrations: https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-a... Just some ideas that might help isolate where the problem might be: 1. Assuming you're using neutron networking, does neutron throw any errors (default migration in nova has "wait_for_vif_plug", part of the migration process is that nova waits for neutron to confirm the "network-vif-plugged" event before starting the actual transfer of data ; this might be close to what you're experiencing?) 2. Does the instance have any PCI devices / PCI Passthrough device that might be not-live-migratable (or is marked as live_migratable:no or none)? 3. Do offline migrations from the source host to the destination host work? 4. Does a live-migration of a smaller instance from the source host to the destination host work? 5. Does the destination host nova-compute or nova-api throw any errors or warnings when the migration is kicked off, complaining about claims, topology, or it's ability to create the new instance? 6. Have you customised your nova settings significantly from the defaults? Do either of the hosts have divergent configuration? There are also some options you can tweak to up the detail of logging that QEMU generates, if you've taken all of the diagnostic steps above and gotten nowhere. QEMU has a analyze-migration.py script that you might be able to reverse engineer to get it to tell you more about what's going on. https://www.qemu.org/docs/master/devel/migration/best-practices.html Good luck - let us know how it turns out! Kind Regards, Joel McLean - Micron21 Pty Ltd From: Karl Kloppenborg <kkloppenborg@resetdata.com.au> Sent: Friday, 12 September 2025 7:07 PM To: Kees Meijs | Nefos <keesm@nefos.com>; OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: Live-migration never completes memory copy Hi Kees, thanks for this, we did try this but still stuck at what appears to be "initial" memory copy Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Kees Meijs | Nefos <keesm@nefos.com<mailto:keesm@nefos.com>> Sent: Friday, September 12, 2025 7:05:42 PM To: Karl Kloppenborg <kkloppenborg@resetdata.com.au<mailto:kkloppenborg@resetdata.com.au>>; OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: Live-migration never completes memory copy Hello Karl, See https://docs.openstack.org/nova/latest/admin/configuring-migrations.html about this. Maybe auto-convergence could be helpful in your case. Cheers, Kees __ Kees Meijs BICT [Nefos Cloud & IT]<https://nefos.com/contact> Nefos IT bv Burgemeester Mollaan 34a 5582 CK Waalre - NL kvk 66494931 +31 (0)88 2088 188<tel:+31882088188> nefos.com<https://nefos.com/contact> The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the organization. No reliance may be placed on this message without written confirmation from an authorised representative of its contents. No guarantee is implied that this message or any attachment is virus free or has not been intercepted and amended. General terms and conditions ("The NLdigital Terms") apply to all our products and services. On 12-09-2025 10:59, Karl Kloppenborg wrote: Hi Openstack Teams, We're attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn't work: 2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip. 2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). Has anyone got insights for this issue? Your help is greatly appreciated. Thanks, Karl. Karl Kloppenborg Chief Technology Officer m: +61 437 239 565 resetdata.com<https://resetdata.com/> [reset.png] ResetData supports Mandatory Client Related Financial Disclosures - Scope 3 Emissions Reporting For more information on the phasing of these requirements for business please visit; https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-stat... This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to contact@resetdata.com.au<mailto:contact@resetdata.com.au>
On 12/09/2025 09:59, Karl Kloppenborg wrote:
Hi Openstack Teams,
We’re attempting to live-migrate instances off of a node, we continuously hit a timeout issue where memory copy doesn’t work: 2025-09-12 08:55:20.116 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 274 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:55:58.807 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 312 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:56:02.468 971204 INFO nova.compute.manager [None req-1f91a8d6-1fa7-47b2-8c3f-70925fb7a219 - - - - - -] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] During sync_power_state the instance has a pending task (migrating). Skip.
2025-09-12 08:56:37.985 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 352 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:17.377 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 391 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0). 2025-09-12 08:57:56.877 971204 INFO nova.virt.libvirt.driver [None req-250bc815-3323-4619-aef8-a11fdf92b27a 405874cabc5b4bf3912a5f89d54eb0d1 21eb701c2a1f48b38dab8f34c0a20902 - - default default] [instance: 2f496843-a2c4-48d4-bbdc-149a2ea76f1c] Migration running for 431 secs, memory 100% remaining (bytes processed=281505, remaining=8595255296, total=8604033024); disk 100% remaining (bytes processed=0, remaining=0, total=0).
Has anyone got insights for this issue?
|This most often happens when the guest you're trying to move is actively mutating state. The way to mitigate this is to generally use auto-converge, which adds pauses to the guest CPU execution to allow the migration to make forward progress at the expense of degraded guest performance. Alternatively, you can use the more powerful/efficient post-copy mechanism: https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... Post-copy first tries to pre-copy the guest memory to the destination. If it gets into a state where the guest is modifying memory faster than it can be transferred, it will swap execution to the destination VM. This causes all writes to stay local to the destination VM and all reads that have not been transferred yet to be pulled on-demand from the source VM. https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... Nova doesn't largely control how the memory is transferred; we just ask the hypervisor (libvirt/qemu in this case) to perform a migration and then monitor it for completion. We are not controlling how the transfer works beyond that. What's a little odd in your case is that the stats are not changing beyond the time. This implies that qemu is not able to transfer any memory at all, which suggests you're hitting some kind of internal qemu bug or limitation. For example, if the guest memory is using 1G hugepages and the guest keeps dirtying the pages, then the "remaining" value may stay the same or even increase. However, I would expect the "processed" value to increase as it tries to transfer the page over and over again, having to restart the transfer every time the page is modified. This is the problem that post-copy was designed to fix. This generally doesn't matter for small pages (i.e., 4k pages that we use by default). Retransferring a 4k page is quick and forward progress can be made. If you're using 2MB or 1GB hugepages, however, you need much, much higher network bandwidth to make any progress. If you have not tried using post-copy, I would recommend trying to enable it and see if it helps. The other possibility is if you are using vGPU (i.e., generic mdevs or the new live migration of a PCI device that uses the vfio-variant driver with managed=no). The stats do not include the memory being transferred for those passthrough devices. To make live migration work in that case, you need to adjust the allowed downtime: https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... In our vGPU docs, we suggest: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html#caveats live_migration_downtime = 500000 live_migration_downtime_steps = 3 live_migration_downtime_delay = 3 500000 is a ridiculously large value for that setting that basically tells libvirt/qemu it can take as much time as it needs to transfer the memory and, in this case, pause the guest for a little over 8 minutes of total downtime. The number was chosen by adding a few zeros to our default of 500ms of total downtime. Setting it to the 2000- 10000ms range might be a more reasonable value. Putting this all together, with a tweak to https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv..., I think a reasonable config to run in production might be: | [libvirt] live_migration_permit_post_copy = true live_migration_downtime = 4000 live_migration_downtime_steps = 5 live_migration_downtime_delay = 15 live_migration_timeout_action = force_complete |However, I would advise reading the help text for each of those options to understand what this is doing and evaluate if it fits your workload/ SLA requirements. |
Your help is greatly appreciated.
Thanks, Karl.
Karl Kloppenborg
Chief Technology Officer
m: _+61 437 239 565_ _resetdata.com <https://resetdata.com/>_
reset.png
*ResetData supports Mandatory Client Related Financial Disclosures – Scope 3 Emissions Reporting *For more information on the phasing of these requirements for business please visit; _https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-stat... <https://treasury.gov.au/sites/default/files/2024-01/c2024-466491-policy-state.pdf>_
This email transmission is intended only for the addressee / person responsible for delivery of the message to such person and may contain confidential or privileged information. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you, nor may you use, review, disclose, disseminate or copy any information contained in or attached to it. Whilst this email has been checked for viruses, the sender does not warrant that any attachments are free from viruses or other defects. You assume all liability for any loss, damage or other consequences which may arise from opening or using the attachments. If you received this e-mail in error please delete it and any attachments and kindly notify us by immediately sending an email to _contact@resetdata.com.au_
participants (4)
-
Joel McLean
-
Karl Kloppenborg
-
Kees Meijs | Nefos
-
Sean Mooney