Hello fellows, I am doing some research on accelerating live migration speed for some memory heavy workload, for example databases. Instances running these applications usually have large amounts of ram and high dirty page rate. Compute nodes are equipped with ten-gigabit NICs. We found it is very hard to complete the migration in 30 mins and we have to use force-complete in the end. I found some material saying enabling compression could boost migration speed and would like to try out. So how can I enable compress live migration and are there other suggestions? Thank you very much for the help. -- Best Regards, Jiatong Shen
On 06/07/2025 13:32, Jiatong Shen wrote:
Hello fellows,
I am doing some research on accelerating live migration speed for some memory heavy workload, for example databases.
the general recommendation is ot not live migrate vms under heavy load but where that is required post_copy live migration should be used.
Instances running these applications usually have large amounts of ram and high dirty page rate. Compute nodes are equipped with ten-gigabit NICs. We found it is very hard to complete the migration in 30 mins and we have to use force-complete in the end. I found some material saying enabling compression could boost migration speed and would like to try out.
we have supproted compression in the past if i recall, in general this is unlikely to help as it actully creates a bottle neck in the signle treathed performace when doing the compression. 10G is kind of the cutoff where compression actually make live migration perfromance worse. we do actully use compression for copying the disks for cold migration if your using the scp driver. https://github.com/openstack/nova/commit/e4aa424642fdf97ae2153ee5c9db65d6122... what transport are you using for live migration? tcp? ssh? tls? libvirt native tunneled live migation shoudl be avoided at all cost if you care about perfriamce at all that is why live_migration_tunnelled<https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_tunnelled> defaults to fals and why we deprecated it. by defautl if you configure nothing https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.liv... is none and will use `qemu+tcp://%s/system` if you instead use `qemu+ssh://%s/system` you can use your ssh config to enable compression by default for all ssh connections. typeically `qemu+tls` provides better performance then qemu+ssh for encypted migration so that is prefered over ssh. libvirt has some documenation on the differences here https://libvirt.org/migration.html nova does not directly configuritn comression at the libivrt/qemu level.
So how can I enable compress live migration and are there other suggestions? Thank you very much for the help.
--
Best Regards,
Jiatong Shen
On Mon, Jul 7, 2025 at 8:32 AM Sean Mooney <smooney@redhat.com> wrote:
On 06/07/2025 13:32, Jiatong Shen wrote:
I am doing some research on accelerating live migration speed for some memory heavy workload, for example databases.
the general recommendation is ot not live migrate vms under heavy load but where that is required post_copy live migration should be used.
Instances running these applications usually have large amounts of ram and high dirty page rate. Compute nodes are equipped with ten-gigabit NICs. We found it is very hard to complete the migration in 30 mins and we have to use force-complete in the end. I found some material saying enabling compression could boost migration speed and would like to try out.
we have supproted compression in the past if i recall, in general this is unlikely to help as it actully creates a bottle neck in the signle treathed performace when doing the compression.
10G is kind of the cutoff where compression actually make live migration perfromance worse.
we do actully use compression for copying the disks for cold migration if your using the scp driver.
Related to your first point - I choose the rsync driver instead of scp driver, as scp is too slow with compression enabled and if I remember correctly, there was no tunable to turn compression off for scp. Compression is not just a single threaded bottleneck. Compression with gzip in particular is slow - at least before zlib-ng. At most it can achieve a few hundred MByte/s on most machines, while 10 Gbit/s can transfer 1+ Gbyte/s. With zstd or lz4 things could be better - but scp didn't support these when I looked at this problem last. The post copy is important. Also, increasing the time you are willing to accept downtime is important. There was also a bug in Qemu 8.2 or so and a bit before where it could do live migration indefinitely, and require abort and retry. In this case, you should see that the memory still to be transferred is not reducing - but is stuck. I have thought about applying what I know to try and make it better - but with rsync, it works good enough for me that it is not an itch I have. Also, our minimum speed is now dual 25 Gbit/s links, and all the new systems have dual 100 Gbit/s links, so compression even with zstd or lz4 is unnecessary and would slow it down. -- Mark Mielke <mark.mielke@gmail.com>
participants (3)
-
Jiatong Shen
-
Mark Mielke
-
Sean Mooney