When the instance is migrated again from te second to the first it takes 10 seconds. If first node has more loads on network or memory, it should take a long time in any case. Keep in mind I am not using hugepages but default configuration. I am convinced that it is about how the memory of an instance is managed after it runs for a long time on a node Ignazio Il giorno ven 5 ago 2022 alle ore 11:45 Gorka Eguileor <geguileo@redhat.com> ha scritto:
On 05/08, Ignazio Cassano wrote:
Migrating again to a new node (COMPUTE C) it takes 10 sec. The first migration from A to B (750 sec) is slow in migrating memory :
*migration running for 30 secs, memory 89% remaining; (bytes processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05 10:47:23.910 55600 INFO nova.virt.libvirt.driver [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory 87% remaining; (bytes processed=1489083638, remaining=15035801600, total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance: d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory 86% remaining; (bytes processed=1689004421, remaining=14802731008, total=17184923648)*
and so on
That sounds crazy to me. Unless the first node has more load or more network usage than the others, or the VM isn't actually running on Compute B so the migration is not really of a running VM...
Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hi, this is the volume attached on netapp nfs about the vm I am
qemu-img info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage: volume-002ff8af-9067-4f84-a01c-d147cdd1f70d file format: raw virtual size: 40G (42949672960 bytes) disk size: 21G
As you can see it is raw and it does not ha base image. Ignazio
Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor < geguileo@redhat.com> ha scritto:
On 05/08, Ignazio Cassano wrote:
Hello, firstly let me to thank you for reply and sorry if I come back to ask why when I do the first migration from A to B it takes 20 minutes and then, when I migrate from B to A it takes few seconds. I wonder if after the first migration memory is reorganized. In the first live migration it lost time to get memory pages ? Ignazio
Hi,
I work on Cinder, so my knowledge on live migrations is mostly limited to the attach/detach flow of the volumes.
I thought that maybe if you were using ephemeral nova volumes (non-cinder) maybe the volume had not yet been deleted from the old node, or maybe it was using a qcow2 base file for multiple instances on the source (each using a different chain on top of it) and this qcow2 was not originally present in the destination (hence the time to copy it), so when we do a migration back since there are other instances
migrating: that
were also using it on the destination (original location) only de difference needs to be copied.
But these are just brainstorming ideas, since I don't really know how Nova handles all this.
I would recommend setting Nova log to debug mode in both source and destination nodes and look at where the time difference really is, in case it's not where you think it is.
Cheers, Gorka.
Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor < geguileo@redhat.com> ha scritto:
On 04/08, Ignazio Cassano wrote: > HI, > I am using cinder volumes. > Ignazio >
Hi,
In that case there is no volume data being copied for the instance migration, and volume attach on the destination should not account for more than 30 seconds of those 20 minutes, so not much improvement possible there.
Cheers, Gorka.
> Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor < geguileo@redhat.com> > ha scritto: > > > On 03/08, Ignazio Cassano wrote: > > > Hello All, > > > I am looking for a solution to speed up live migration. > > > Instances where ram is used heavily like java application servers, live > > > migration take a long time (more than 20 minutes for 8GB ram instance) > > and > > > converge mode is already set to True in nova.conf. > > > > Hi, > > > > Probably doesn't affect your case, but I assume you are using ephemeral > > nova boot volumes. > > > > Have you tried using only Cinder volumes on the VM? > > > > Cheers, > > Gorka. > > > > > > > I also tried with post_copy but it does not change. > > > After the first live migration (very solow) if I try to migrate again it > > is > > > very fast. > > > I presume the first migration is slow because memory fragmentation when > > an > > > instance is running on the same compute node for a long time. > > > I am looking for a solution considering the on my computing node I can > > have > > > a little ram overcommit. Any case I am increasing the number of compute > > > nodes to reduce it. > > > Thanks > > > Ignazio > > > >