[openstack] how to speed up live migration?

Ignazio Cassano ignaziocassano at gmail.com
Fri Aug 5 09:27:12 UTC 2022


Migrating again to a new node (COMPUTE C) it takes 10 sec.
The first migration from A to B (750 sec)  is slow in migrating memory :


*migration running for 30 secs, memory 89% remaining; (bytes
processed=1258508063, remaining=15356194816, total=17184923648)2022-08-05
10:47:23.910 55600 INFO nova.virt.libvirt.driver
[req-ff02667e-9d38-4a08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 60 secs, memory
87% remaining; (bytes processed=1489083638, remaining=15035801600,
total=17184923648)08-9c63-013ed1064218 66adb965bef64eaaab2af93ade87e2ca
85cace94dcc7484c85ff9337eb1d0c4c - default default] [instance:
d1aae4bb-9a2b-454f-9018-568af6a98cc3] Migration running for 90 secs, memory
86% remaining; (bytes processed=1689004421, remaining=14802731008,
total=17184923648)*

and so on

Il giorno ven 5 ago 2022 alle ore 11:18 Ignazio Cassano <
ignaziocassano at gmail.com> ha scritto:

> Hi, this is the volume attached on netapp nfs about the vm I am migrating:
> qemu-img  info volume-002ff8af-9067-4f84-a01c-d147cdd1f70dqimage:
> volume-002ff8af-9067-4f84-a01c-d147cdd1f70d
> file format: raw
> virtual size: 40G (42949672960 bytes)
> disk size: 21G
>
> As you can see it is raw and it does not ha base image.
> Ignazio
>
>
>
> Il giorno ven 5 ago 2022 alle ore 10:49 Gorka Eguileor <
> geguileo at redhat.com> ha scritto:
>
>> On 05/08, Ignazio Cassano wrote:
>> > Hello, firstly let me to thank you for reply and sorry if I come back to
>> > ask why when I do the first migration from A to B it takes 20 minutes
>> and
>> > then, when I migrate from B to A it takes few seconds.
>> > I wonder if after the first migration memory is reorganized.
>> > In the first live migration it lost time to get memory pages ?
>> > Ignazio
>> >
>>
>> Hi,
>>
>> I work on Cinder, so my knowledge on live migrations is mostly limited
>> to the attach/detach flow of the volumes.
>>
>> I thought that maybe if you were using ephemeral nova volumes
>> (non-cinder) maybe the volume had not yet been deleted from the old
>> node, or maybe it was using a qcow2 base file for multiple instances on
>> the source (each using a different chain on top of it) and this qcow2
>> was not originally present in the destination (hence the time to copy
>> it), so when we do a migration back since there are other instances that
>> were also using it on the destination (original location) only de
>> difference needs to be copied.
>>
>> But these are just brainstorming ideas, since I don't really know how
>> Nova handles all this.
>>
>> I would recommend setting Nova log to debug mode in both source and
>> destination nodes and look at where the time difference really is, in
>> case it's not where you think it is.
>>
>> Cheers,
>> Gorka.
>>
>>
>> > Il giorno ven 5 ago 2022 alle ore 10:17 Gorka Eguileor <
>> geguileo at redhat.com>
>> > ha scritto:
>> >
>> > > On 04/08, Ignazio Cassano wrote:
>> > > > HI,
>> > > > I am using cinder volumes.
>> > > > Ignazio
>> > > >
>> > >
>> > > Hi,
>> > >
>> > > In that case there is no volume data being copied for the instance
>> > > migration, and volume attach on the destination should not account for
>> > > more than 30 seconds of those 20 minutes, so not much improvement
>> > > possible there.
>> > >
>> > > Cheers,
>> > > Gorka.
>> > >
>> > > > Il giorno gio 4 ago 2022 alle ore 16:56 Gorka Eguileor <
>> > > geguileo at redhat.com>
>> > > > ha scritto:
>> > > >
>> > > > > On 03/08, Ignazio Cassano wrote:
>> > > > > > Hello All,
>> > > > > > I am looking for a solution to speed up live migration.
>> > > > > > Instances where ram is used heavily like java application
>> servers,
>> > > live
>> > > > > > migration take a long time (more than 20 minutes for 8GB ram
>> > > instance)
>> > > > > and
>> > > > > > converge mode is already set to True in nova.conf.
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > Probably doesn't affect your case, but I assume you are using
>> ephemeral
>> > > > > nova boot volumes.
>> > > > >
>> > > > > Have you tried using only Cinder volumes on the VM?
>> > > > >
>> > > > > Cheers,
>> > > > > Gorka.
>> > > > >
>> > > > >
>> > > > > > I also tried with post_copy but it does not change.
>> > > > > > After the first live migration (very solow) if I try to migrate
>> > > again it
>> > > > > is
>> > > > > > very fast.
>> > > > > > I presume the first migration is slow because memory
>> fragmentation
>> > > when
>> > > > > an
>> > > > > > instance is running on the same compute node for a long time.
>> > > > > > I am looking for a solution considering the on my computing
>> node I
>> > > can
>> > > > > have
>> > > > > > a little ram overcommit. Any case I am increasing the number of
>> > > compute
>> > > > > > nodes to reduce it.
>> > > > > > Thanks
>> > > > > > Ignazio
>> > > > >
>> > > > >
>> > >
>> > >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220805/9cfee10b/attachment.htm>


More information about the openstack-discuss mailing list