On 24/03/2026 13:28, Timothé Baugé wrote:
> Hello stackers,
>
> I'm reaching out to the community to understand how you manage
> migrations of memory-intensive instances.
> We are running RHOSP 17.1 (based on Wallaby) and have faced several
> issues when live-migrating instances running on compute nodes to do
> maintenance work.
for memory intesive workloads (with or without hugepages) we recommend
enabling post-copy live migration
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_permit_post_copy
this is effectively mandatory if a guest is using hugepages but it
highly recommend in general.
i thought this was the default in rhosp 17.1 but perhaps it only became
the default in 18.
can you confirm you have not override the default and that you are using
post-copy?
https://github.com/openstack-archive/tripleo-heat-templates/blob/stable/wallaby/deployment/nova/nova-compute-container-puppet.yaml#L501-L520
in 16.2 this was not enable by default and some rhosp user never updated
there config for the new defaults.
>
> For several memory-intensive instances, the process just never
> completes properly and ends in error after hours waiting. During the
> migration, we can see in nova-computes.log of destination compute node
> that the migration never truly migrates the memory, whenever the
> reported percentage of memory remaining is close to 0%, it ends up to
> go back to high percentage again.
setting
```
[libvirt]
live_migration_timeout_action=force_complete
```
can help with this but it has the sideeffect that when the timeout
expires it can result in perceptible downtime in the guest
because it will force pause the guess for the memory copy to happen.
https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/nova.conf#L221-L238
is our new defualts downstream
```
live_migration_permit_post_copy=true
live_migration_permit_auto_converge=true
live_migration_timeout_action=force_complete
```
i think shoudl actully be the defaults upstream and we shoudl eveutlly
consider removign the option to disabel those but
we are even more conservitige in our upstream default then our already
conservitive downstream defautls.
> By looking at the migration UUID, we saw that the memory processed
> bytes is way higher than the memory total bytes [1].
ya that is what happens if the guest is writign to the memroy and
dirtying pages at a higher rate then your network bandwith will allow
if a guest dirties 1 byte in a memory page the entire page needs to be
copied again.
that tollerable for 4k pages without post_copy but its not for 1G
hugepages and often is not feasible for 2mb hugespages either.
does the vm have hugepages enabled?
> We played with several `live_migration` options in nova.conf but to no
> end.
there are some more advance tunabels like live_migration_downtime
live_migration_downtime_steps live_migration_downtime_delay that might
be of use
but i generally do not recommend changing those unless you ahve already
enabeld post_copy and the force_complete timeout action
> Has anyone faced the same issue ?
> And most importantly, how do you handle the migration of your
> memory-intensive workload ?
>
> Regards,
> Timothé
>
> [1].
> $ openstack server migration show 328af6d9-9c9d-4671-a8f9-2c0d3df32b93
> e2cda681-e25c-4aca-813c-3641bc6164c9
> +------------------------+------------------------------------------------------------------+
> | Field | Value |
> +------------------------+------------------------------------------------------------------+
> | ID | 13231 |
> | Server UUID | 328af6d9-9c9d-4671-a8f9-2c0d3df32b93
> |
> | Status | running |
> | Source Compute | compute02 |
> | Source Node | compute02 |
> | Dest Compute | compute01 |
> | Dest Host | None |
> | Dest Node | compute01 |
> | Memory Total Bytes | 137448202240 |
> | Memory Processed Bytes | 5300502117730 |
> | Memory Remaining Bytes | 52182556672 |
> | Disk Total Bytes | 0 |
> | Disk Processed Bytes | 0 |
> | Disk Remaining Bytes | 0 |
> | Created At | 2026-03-24T10:49:15.000000
> |
> | Updated At | 2026-03-24T13:14:32.000000
> |
> | UUID | e2cda681-e25c-4aca-813c-3641bc6164c9
> |
> | User ID |
> cc4367e52cce828fa3e378f29ed6df553c2dd99e9a4b33f1835fee719d592c91 |
> | Project ID | 0382d25c311149fabd7bea0d6fa3ac37
> |
> +------------------------+------------------------------------------------------------------+