[openstack] how to speed up live migration?

Ignazio Cassano ignaziocassano at gmail.com
Fri Aug 5 09:53:18 UTC 2022


When the instance is migrated again from te second to the first it takes 10
seconds.
If first node has more loads on network or memory, it should take a long
time in any case.
Keep in mind I am not using hugepages but default configuration.

I am convinced that it is about how the memory of an instance is managed
after it runs for a long time on a node
Ignazio

Il giorno ven 5 ago 2022 alle ore 11:45 Gorka Eguileor <geguileo at redhat.com>
ha scritto:

> On 05/08, Ignazio Cassano wrote:
> > Migrating again to a new node (COMPUTE C) it takes 10 sec.
> > The first migration from A to B (750 sec)  is slow in migrating memory :
> >
> >
> > *migration running for 30 secs, memory 89% remaining; (bytes
> > processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05
> > 10:47:23.910 55600 INFO nova.virt.libvirt.driver
> > [req-ff02667e-9d38-4a08-9c63-013ed1064218
> 66adb965bef64eaaab2af93ade87e2ca
> > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
> > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs,
> memory
> > 87% remaining; (bytes processed=1489083638, remaining=15035801600,
> > total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
> > 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
> > d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs,
> memory
> > 86% remaining; (bytes processed=1689004421, remaining=14802731008,
> > total=17184923648)*
> >
> > and so on
>
> That sounds crazy to me.  Unless the first node has more load or more
> network usage than the others, or the VM isn't actually running on
> Compute B so the migration is not really of a running VM...
>
>
>
> >
> > Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano <
> > ignaziocassano at gmail.com> ha scritto:
> >
> > > Hi, this is the volume attached on netapp nfs about the vm I am
> migrating:
> > > qemu-img  info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage:
> > > volume-002ff8af-9067-4f84-a01c-d147cdd1f70d
> > > file format: raw
> > > virtual size: 40G (42949672960 bytes)
> > > disk size: 21G
> > >
> > > As you can see it is raw and it does not ha base image.
> > > Ignazio
> > >
> > >
> > >
> > > Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor <
> > > geguileo at redhat.com> ha scritto:
> > >
> > >> On 05/08, Ignazio Cassano wrote:
> > >> > Hello, firstly let me to thank you for reply and sorry if I come
> back to
> > >> > ask why when I do the first migration from A to B it takes 20
> minutes
> > >> and
> > >> > then, when I migrate from B to A it takes few seconds.
> > >> > I wonder if after the first migration memory is reorganized.
> > >> > In the first live migration it lost time to get memory pages ?
> > >> > Ignazio
> > >> >
> > >>
> > >> Hi,
> > >>
> > >> I work on Cinder, so my knowledge on live migrations is mostly limited
> > >> to the attach/detach flow of the volumes.
> > >>
> > >> I thought that maybe if you were using ephemeral nova volumes
> > >> (non-cinder) maybe the volume had not yet been deleted from the old
> > >> node, or maybe it was using a qcow2 base file for multiple instances
> on
> > >> the source (each using a different chain on top of it) and this qcow2
> > >> was not originally present in the destination (hence the time to copy
> > >> it), so when we do a migration back since there are other instances
> that
> > >> were also using it on the destination (original location) only de
> > >> difference needs to be copied.
> > >>
> > >> But these are just brainstorming ideas, since I don't really know how
> > >> Nova handles all this.
> > >>
> > >> I would recommend setting Nova log to debug mode in both source and
> > >> destination nodes and look at where the time difference really is, in
> > >> case it's not where you think it is.
> > >>
> > >> Cheers,
> > >> Gorka.
> > >>
> > >>
> > >> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor <
> > >> geguileo at redhat.com>
> > >> > ha scritto:
> > >> >
> > >> > > On 04/08, Ignazio Cassano wrote:
> > >> > > > HI,
> > >> > > > I am using cinder volumes.
> > >> > > > Ignazio
> > >> > > >
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > In that case there is no volume data being copied for the instance
> > >> > > migration, and volume attach on the destination should not
> account for
> > >> > > more than 30 seconds of those 20 minutes, so not much improvement
> > >> > > possible there.
> > >> > >
> > >> > > Cheers,
> > >> > > Gorka.
> > >> > >
> > >> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor <
> > >> > > geguileo at redhat.com>
> > >> > > > ha scritto:
> > >> > > >
> > >> > > > > On 03/08, Ignazio Cassano wrote:
> > >> > > > > > Hello All,
> > >> > > > > > I am looking for a solution to speed up live migration.
> > >> > > > > > Instances where ram is used heavily like java application
> > >> servers,
> > >> > > live
> > >> > > > > > migration take a long time (more than 20 minutes for 8GB ram
> > >> > > instance)
> > >> > > > > and
> > >> > > > > > converge mode is already set to True in nova.conf.
> > >> > > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > Probably doesn't affect your case, but I assume you are using
> > >> ephemeral
> > >> > > > > nova boot volumes.
> > >> > > > >
> > >> > > > > Have you tried using only Cinder volumes on the VM?
> > >> > > > >
> > >> > > > > Cheers,
> > >> > > > > Gorka.
> > >> > > > >
> > >> > > > >
> > >> > > > > > I also tried with post_copy but it does not change.
> > >> > > > > > After the first live migration (very solow) if I try to
> migrate
> > >> > > again it
> > >> > > > > is
> > >> > > > > > very fast.
> > >> > > > > > I presume the first migration is slow because memory
> > >> fragmentation
> > >> > > when
> > >> > > > > an
> > >> > > > > > instance is running on the same compute node for a long
> time.
> > >> > > > > > I am looking for a solution considering the on my computing
> > >> node I
> > >> > > can
> > >> > > > > have
> > >> > > > > > a little ram overcommit. Any case I am increasing the
> number of
> > >> > > compute
> > >> > > > > > nodes to reduce it.
> > >> > > > > > Thanks
> > >> > > > > > Ignazio
> > >> > > > >
> > >> > > > >
> > >> > >
> > >> > >
> > >>
> > >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220805/28398e14/attachment-0001.htm>


More information about the openstack-discuss mailing list