one thing to be aware of is if the vm writes even a singel byte to a memory page during the migration then entire page needs to be transferred again. not just that one byte which gets expensive if you use hugepages as a one byte write gets amplified to at 2mb or 1GB page copy. even for the default 4k pages its expensive. post-copy adn auto converge help with that to a degree but yes it sounds like this might be memory related but it could still be a network bandwidth limitation. using jumbo frames on the migration network may help as well as disabling tcp slow start. im not sure if there is really anything that can be done to increase the initial migration time beyond that. On Fri, Aug 5, 2022 at 11:26 AM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
On Fri, 5 Aug 2022 at 12:00, Ignazio Cassano <ignaziocassano@gmail.com> wrote:
When the instance is migrated again from te second to the first it takes 10 seconds. If first node has more loads on network or memory, it should take a long time in any case. Keep in mind I am not using hugepages but default configuration.
I am convinced that it is about how the memory of an instance is managed after it runs for a long time on a node
Just keep in mind the transfer rates you get are VERY LOW for anything RAM-like. It's around 20 MiB/s - my old HDD could go faster than that with mediocre fragmentation. ;-) It's more likely it spends time waiting for something instead of doing real work.
-yoctozepto