[openstack-dev] [nova] Enabling VM post-copy live migration
John Garbutt
john at johngarbutt.com
Thu Mar 12 15:41:34 UTC 2015
On 12 March 2015 at 12:26, Luis Tomas <luis at cs.umu.se> wrote:
> On 03/12/2015 12:34 PM, John Garbutt wrote:
>>
>> On 12 March 2015 at 08:41, Luis Tomas <luis at cs.umu.se> wrote:
>>>
>>> Hi,
>>>
>>> As part of an European (FP7) project, named ORBIT
>>> (http://www.orbitproject.eu/), I'm working on including the possibility
>>> of
>>> live-migrating VMs in OpenStack in a post-copy mode.
>>> This way of live-migrating VMs basically moves the computation right away
>>> to
>>> the destination and then the VM starts working from there, while still
>>> copying the memory from the source to the new location of the VM. That
>>> way
>>> the memory pages are only copied as if the VM modifies them, they are
>>> already in the destination host. This basically ensures that migrations
>>> finish regardless of what the VM is doing, i.e., even extremely memory
>>> intensive VMs. Therefore removing the problem of having VMs hanging on in
>>> migrating state forever (as discussed in previous mails, e.g.,
>>>
>>> http://lists.openstack.org/pipermail/openstack-dev/2015-February/055725.html).
>>>
>>> So far, I have included and tested this new functionality at the JUNO
>>> version, and the code modifications can be found in the github repository
>>> of
>>> the project (branch named "post-copy"):
>>> - https://github.com/orbitfp7/nova/tree/post-copy --> mainly
>>> enabling
>>> the possibility of using the libvirt post-copy flag (libvirt driver.py).
>>> Note post-copy migration is not using "tunneling" as LibVirt patch for
>>> that
>>> is not yet ready.
>>> - https://github.com/orbitfp7/python-novaclient/tree/post-copy -->
>>> adding the possibility of using the post-copy mode when triggering the
>>> migration: nova live-migration [--block-migrate] [--post-copy] VM_ID
>>> - https://github.com/orbitfp7/horizon/tree/post-copy --> include a
>>> checkbox in the live-migration panel to perform the migration in
>>> post-copy
>>> mode. (like the one for enabling block-migration)
>>>
>>> To be able to live-migrate VMs in a post-copy way, I'm relying on some
>>> kernel+qemu+libvirt modifications, not yet merged upstream (but in their
>>> way
>>> to it), also available at the project github:
>>> - Kernel: https://lkml.org/lkml/2015/3/5/576
>>> - Qemu: https://github.com/orbitfp7/qemu/tree/wp3-postcopy
>>> - LibVirt: https://github.com/orbitfp7/libvirt/tree/wp3-postcopy
>>
>> Before merging the code in Nova, we usually like the dependent
>> features to be released by the respective projects.
>>
>> Ideally we would like it to be easy to run that on some distro so
>> people could test/use the feature fairly easily.
>
> Yes, that's why I proposed to target the version after kilo (or even the
> next to that one if need be)
Ah, cool. I just wanted to be explicit about that.
>>> If this is a nice feature to have in future versions of OpenStack, I'm
>>> happy
>>> to adapt the code for the next release (the one after KILO). Any comments
>>> are really welcome.
>>
>> It sounds like something that doesn't need an API call, as its a
>> deployer choice if they have support for this new live-migrate mode.
>> Is that true?
>>
>> Although maybe it has a substantial runtime penalty as a page read
>> miss causes a fetch across the network, making it a user choice? Or do
>> you only start the fetch mode at the point you detect a failure to
>> "merge" using the regular live-migrate mode?
>
>
> I think it should be up to the user/admin what option to choose.
> Although post-copy ensures that the migration will finish, as you said, it
> could have some impact into the VM performance due to having to wait until a
> missing memory page is fetched. Anyway, I wouldn't say there is a
> substantial runtime penalty. In fact, the libvirt flag that we have included
> in OpenStack basically tries pre-copy first (normal live-migration), and
> after trying to copy all the memory once (first iteration), automatically
> changes to post-copy, meaning moving the VM cpu to the destination and only
> having to copy the remaining pages (the ones dirtied while doing the first
> copy iteration). This way the impact into the application performance is
> minimized.
Ah, thats what I was trying to describe and failed. Sounds good.
> On the other hand, post-copy has a downside. If by any chance the migration
> crash during the process, unlike pre-copy, you can not recover the VM as not
> the source nor the destination has a fully working VM at the time (part of
> the memory in the source, part of it at the destination).
Eek, good point.
> These are basically the reasons we considered for making it as an optional
> choice.
Totally make sense.
Only tip is to include that sort of information when you submit your
nova-spec, once those features are merged and released.
Thanks,
John
More information about the OpenStack-dev
mailing list