[openstack] how to speed up live migration?

Gorka Eguileor geguileo at redhat.com
Fri Aug 5 09:45:15 UTC 2022


On 05/08, Ignazio Cassano wrote:
> Migrating again to a new node (COMPUTE C) it takes 10 sec.
> The first migration from A to B (750 sec)  is slow in migrating memory :
>
>
> *migration running for 30 secs, memory 89% remaining; (bytes
> processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05
> 10:47:23.910 55600 INFO nova.virt.libvirt.driver
> [req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
> 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
> d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory
> 87% remaining; (bytes processed=1489083638, remaining=15035801600,
> total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
> 85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
> d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory
> 86% remaining; (bytes processed=1689004421, remaining=14802731008,
> total=17184923648)*
>
> and so on

That sounds crazy to me.  Unless the first node has more load or more
network usage than the others, or the VM isn't actually running on
Compute B so the migration is not really of a running VM...



>
> Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano <
> ignaziocassano at gmail.com> ha scritto:
>
> > Hi, this is the volume attached on netapp nfs about the vm I am migrating:
> > qemu-img  info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage:
> > volume-002ff8af-9067-4f84-a01c-d147cdd1f70d
> > file format: raw
> > virtual size: 40G (42949672960 bytes)
> > disk size: 21G
> >
> > As you can see it is raw and it does not ha base image.
> > Ignazio
> >
> >
> >
> > Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor <
> > geguileo at redhat.com> ha scritto:
> >
> >> On 05/08, Ignazio Cassano wrote:
> >> > Hello, firstly let me to thank you for reply and sorry if I come back to
> >> > ask why when I do the first migration from A to B it takes 20 minutes
> >> and
> >> > then, when I migrate from B to A it takes few seconds.
> >> > I wonder if after the first migration memory is reorganized.
> >> > In the first live migration it lost time to get memory pages ?
> >> > Ignazio
> >> >
> >>
> >> Hi,
> >>
> >> I work on Cinder, so my knowledge on live migrations is mostly limited
> >> to the attach/detach flow of the volumes.
> >>
> >> I thought that maybe if you were using ephemeral nova volumes
> >> (non-cinder) maybe the volume had not yet been deleted from the old
> >> node, or maybe it was using a qcow2 base file for multiple instances on
> >> the source (each using a different chain on top of it) and this qcow2
> >> was not originally present in the destination (hence the time to copy
> >> it), so when we do a migration back since there are other instances that
> >> were also using it on the destination (original location) only de
> >> difference needs to be copied.
> >>
> >> But these are just brainstorming ideas, since I don't really know how
> >> Nova handles all this.
> >>
> >> I would recommend setting Nova log to debug mode in both source and
> >> destination nodes and look at where the time difference really is, in
> >> case it's not where you think it is.
> >>
> >> Cheers,
> >> Gorka.
> >>
> >>
> >> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor <
> >> geguileo at redhat.com>
> >> > ha scritto:
> >> >
> >> > > On 04/08, Ignazio Cassano wrote:
> >> > > > HI,
> >> > > > I am using cinder volumes.
> >> > > > Ignazio
> >> > > >
> >> > >
> >> > > Hi,
> >> > >
> >> > > In that case there is no volume data being copied for the instance
> >> > > migration, and volume attach on the destination should not account for
> >> > > more than 30 seconds of those 20 minutes, so not much improvement
> >> > > possible there.
> >> > >
> >> > > Cheers,
> >> > > Gorka.
> >> > >
> >> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor <
> >> > > geguileo at redhat.com>
> >> > > > ha scritto:
> >> > > >
> >> > > > > On 03/08, Ignazio Cassano wrote:
> >> > > > > > Hello All,
> >> > > > > > I am looking for a solution to speed up live migration.
> >> > > > > > Instances where ram is used heavily like java application
> >> servers,
> >> > > live
> >> > > > > > migration take a long time (more than 20 minutes for 8GB ram
> >> > > instance)
> >> > > > > and
> >> > > > > > converge mode is already set to True in nova.conf.
> >> > > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > Probably doesn't affect your case, but I assume you are using
> >> ephemeral
> >> > > > > nova boot volumes.
> >> > > > >
> >> > > > > Have you tried using only Cinder volumes on the VM?
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Gorka.
> >> > > > >
> >> > > > >
> >> > > > > > I also tried with post_copy but it does not change.
> >> > > > > > After the first live migration (very solow) if I try to migrate
> >> > > again it
> >> > > > > is
> >> > > > > > very fast.
> >> > > > > > I presume the first migration is slow because memory
> >> fragmentation
> >> > > when
> >> > > > > an
> >> > > > > > instance is running on the same compute node for a long time.
> >> > > > > > I am looking for a solution considering the on my computing
> >> node I
> >> > > can
> >> > > > > have
> >> > > > > > a little ram overcommit. Any case I am increasing the number of
> >> > > compute
> >> > > > > > nodes to reduce it.
> >> > > > > > Thanks
> >> > > > > > Ignazio
> >> > > > >
> >> > > > >
> >> > >
> >> > >
> >>
> >>




More information about the openstack-discuss mailing list