Re: [openstack] how to speed up live migration?

5 Aug 2022


      When the instance is migrated again from te second to the first it takes 10
seconds.
If first node has more loads on network or memory, it should take a long
time in any case.
Keep in mind I am not using hugepages but default configuration.

I am convinced that it is about how the memory of an instance is managed
after it runs for a long time on a node
Ignazio

Il giorno ven 5 ago 2022 alle ore 11:45 Gorka Eguileor <geguileo@redhat.com>
ha scritto:
...
On 05/08, Ignazio Cassano wrote:
...
Migrating again to a new node (COMPUTE C) it takes 10 sec.
The first migration from A to B (750 sec)  is slow in migrating memory :
*migration running for 30 secs, memory 89% remaining; (bytes
processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05
10:47:23.910 55600 INFO nova.virt.libvirt.driver
[req-ff02667e-9d38-4a08-9c63-013ed1064218
66adb965bef64eaaab2af93ade87e2ca
85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs,
memory
87% remaining; (bytes processed=1489083638, remaining=15035801600,
total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs,
memory
86% remaining; (bytes processed=1689004421, remaining=14802731008,
total=17184923648)*
and so on
That sounds crazy to me.  Unless the first node has more load or more
network usage than the others, or the VM isn't actually running on
Compute B so the migration is not really of a running VM...
...
Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano <
ignaziocassano@gmail.com> ha scritto:
...
Hi, this is the volume attached on netapp nfs about the vm I am
...
...
qemu-img  info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage:
volume-002ff8af-9067-4f84-a01c-d147cdd1f70d
file format: raw
virtual size: 40G (42949672960 bytes)
disk size: 21G
As you can see it is raw and it does not ha base image.
Ignazio
Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor <
geguileo@redhat.com> ha scritto:
...
On 05/08, Ignazio Cassano wrote:
...
Hello, firstly let me to thank you for reply and sorry if I come
back to
ask why when I do the first migration from A to B it takes 20
minutes
and
then, when I migrate from B to A it takes few seconds.
I wonder if after the first migration memory is reorganized.
In the first live migration it lost time to get memory pages ?
Ignazio
Hi,
I work on Cinder, so my knowledge on live migrations is mostly limited
to the attach/detach flow of the volumes.
I thought that maybe if you were using ephemeral nova volumes
(non-cinder) maybe the volume had not yet been deleted from the old
node, or maybe it was using a qcow2 base file for multiple instances
on
the source (each using a different chain on top of it) and this qcow2
was not originally present in the destination (hence the time to copy
it), so when we do a migration back since there are other instances
migrating:
that
...
...
...
were also using it on the destination (original location) only de
difference needs to be copied.
But these are just brainstorming ideas, since I don't really know how
Nova handles all this.
I would recommend setting Nova log to debug mode in both source and
destination nodes and look at where the time difference really is, in
case it's not where you think it is.
Cheers,
Gorka.
...
Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor <
geguileo@redhat.com>
ha scritto:
...
On 04/08, Ignazio Cassano wrote:
> HI,
> I am using cinder volumes.
> Ignazio
>
Hi,
In that case there is no volume data being copied for the instance
migration, and volume attach on the destination should not
account for
more than 30 seconds of those 20 minutes, so not much improvement
possible there.
Cheers,
Gorka.
> Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor <
geguileo@redhat.com>
> ha scritto:
>
> > On 03/08, Ignazio Cassano wrote:
> > > Hello All,
> > > I am looking for a solution to speed up live migration.
> > > Instances where ram is used heavily like java application
servers,
live
> > > migration take a long time (more than 20 minutes for 8GB ram
instance)
> > and
> > > converge mode is already set to True in nova.conf.
> >
> > Hi,
> >
> > Probably doesn't affect your case, but I assume you are using
ephemeral
> > nova boot volumes.
> >
> > Have you tried using only Cinder volumes on the VM?
> >
> > Cheers,
> > Gorka.
> >
> >
> > > I also tried with post_copy but it does not change.
> > > After the first live migration (very solow) if I try to
migrate
again it
> > is
> > > very fast.
> > > I presume the first migration is slow because memory
fragmentation
when
> > an
> > > instance is running on the same compute node for a long
time.
> > > I am looking for a solution considering the on my computing
node I
can
> > have
> > > a little ram overcommit. Any case I am increasing the
number of
compute
> > > nodes to reduce it.
> > > Thanks
> > > Ignazio
> >
> >