[openstack-dev] [nova] Live migration with claim
Andrey Volkov
avolkov at mirantis.com
Thu Feb 9 11:46:57 UTC 2017
Hi,
I started to review patch series [1] which addresses the issue with
live migration resources. While doing that I made some notes possibly
can be useful for reviewers. I would like to share those notes and to
ask community to look critically and check if I'm wrong in my
conclusions.
** How nova make live migration (LM)?
*** Components of LM workflow
In LM process the following components are involved:
- nova-api
Migration params are determined and validated on this level, most
important:
- instance - source VM
- host - target hostname
- block_migration
- force
- conductor
Some orchestration process is done on this level:
- migration object creating
- LiveMigrationTask building and executing
- scheduler call
- check_can_live_migrate_destination - RPC request to compute node to check
that destination environment is appropriate. On destination node
check_can_live_migrate_source call is made to check rollback is
possible.
- migration call to the source compute node
- scheduler
Scheduler is involved in LM only if the destination host is
empty. In that case, scheduler's select_destinations function pick
an appropriate host, conductor also calls
check_can_live_migrate_destination on picked host.
- compute source node
It's the place where migration starts and ends.
- pre_live_migration call to destination node is made first
- control is transferred to the underlying driver for migration
- migration monitor is started
- post_live_migration or rollback is made
- compute destination node
Calls from conductor and source node are processed here,
check_can_live_migrate_source is made to the source node.
*** Common calls diagram
http://amadev.ru/static/lm_diagram.png
*** Calls list for the libvirt case
The following list of calls can be used as reference.
- nova.api.openstack.compute.migrate_server.MigrateServerController._migrate_live
- nova.compute.api.API.live_migrate
- nova.conductor.api.ComputeTaskAPI.live_migrate_instance
- nova.conductor.manager.ComputeTaskManager._live_migrate
- nova.conductor.manager.ComputeTaskManager._build_live_migrate_task
- nova.conductor.tasks.live_migrate.LiveMigrationTask._execute
- nova.conductor.tasks.live_migrate.LiveMigrationTask._find_destination
- nova.scheduler.manager.SchedulerManager.select_destinations
- nova.conductor.tasks.live_migrate.LiveMigrationTask._call_livem_checks_on_host
- nova.compute.manager.ComputeManager.check_can_live_migrate_destination
- nova.compute.manager.ComputeManager.live_migration
- nova.compute.manager.ComputeManager._do_live_migration
- nova.compute.manager.pre_live_migration
- nova.virt.libvirt.driver.LibvirtDriver._live_migration_operation
- nova.virt.libvirt.guest.Guest.migrate
- librirt:domain.migrateToURI{,2,3}
- nova.compute.manager.ComputeManager.post_live_migration_at_destination
** What is the problem with LM?
Nova doesn't claim resources within LM, so we can get in a situation
with wrong scheduling until next periodic update_available_resource is
done. It has good description in bug [2].
** What changes in patch were done?
New live_migration_claim was added to the ResourceTracker similarly to
resize and rebuild claim.
It was decided to initiate live_migration_claim within
check_can_live_migrate_destination on destination node. To make that
done migration (was created in conductor) and resource limits for
destination node (got from scheduler) must be passed to
check_can_live_migrate_destination, so that's why conductor call and
compute RPC API were changed.
Overall intention of this patch is taking info account amount of
resources on destination node that can be a basement for future LM
improvement related to numa, sr-iov, huge pages.
[1] https://review.openstack.org/#/c/244489/
[2] https://bugs.launchpad.net/nova/+bug/1289064
--
Thanks,
Andrey Volkov,
Software Engineer, Mirantis, Inc.
More information about the OpenStack-dev
mailing list