[openstack-dev] [nova] Live migration with claim

Andrey Volkov avolkov at mirantis.com
Thu Feb 9 11:46:57 UTC 2017


Hi,

I started to review patch series [1] which addresses the issue with
live migration resources. While doing that I made some notes possibly
can be useful for reviewers. I would like to share those notes and to
ask community to look critically and check if I'm wrong in my
conclusions.

** How nova make live migration (LM)?

*** Components of LM workflow

In LM process the following components are involved:
- nova-api 
  Migration params are determined and validated on this level, most
  important:
  - instance - source VM
  - host - target hostname
  - block_migration
  - force
- conductor
  Some orchestration process is done on this level:
  - migration object creating
  - LiveMigrationTask building and executing
  - scheduler call
  - check_can_live_migrate_destination - RPC request to compute node to check 
    that destination environment is appropriate. On destination node
    check_can_live_migrate_source call is made to check rollback is
    possible.
  - migration call to the source compute node
- scheduler
  Scheduler is involved in LM only if the destination host is
  empty. In that case, scheduler's select_destinations function pick
  an appropriate host, conductor also calls
  check_can_live_migrate_destination on picked host.
- compute source node
  It's the place where migration starts and ends.
  - pre_live_migration call to destination node is made first
  - control is transferred to the underlying driver for migration
  - migration monitor is started
  - post_live_migration or rollback is made
- compute destination node
  Calls from conductor and source node are processed here,
  check_can_live_migrate_source is made to the source node.

*** Common calls diagram

http://amadev.ru/static/lm_diagram.png

*** Calls list for the libvirt case

The following list of calls can be used as reference.
  
- nova.api.openstack.compute.migrate_server.MigrateServerController._migrate_live
- nova.compute.api.API.live_migrate
- nova.conductor.api.ComputeTaskAPI.live_migrate_instance
- nova.conductor.manager.ComputeTaskManager._live_migrate
- nova.conductor.manager.ComputeTaskManager._build_live_migrate_task
- nova.conductor.tasks.live_migrate.LiveMigrationTask._execute
- nova.conductor.tasks.live_migrate.LiveMigrationTask._find_destination
- nova.scheduler.manager.SchedulerManager.select_destinations
- nova.conductor.tasks.live_migrate.LiveMigrationTask._call_livem_checks_on_host
- nova.compute.manager.ComputeManager.check_can_live_migrate_destination
- nova.compute.manager.ComputeManager.live_migration
- nova.compute.manager.ComputeManager._do_live_migration
- nova.compute.manager.pre_live_migration
- nova.virt.libvirt.driver.LibvirtDriver._live_migration_operation
- nova.virt.libvirt.guest.Guest.migrate
- librirt:domain.migrateToURI{,2,3}
- nova.compute.manager.ComputeManager.post_live_migration_at_destination

** What is the problem with LM?

Nova doesn't claim resources within LM, so we can get in a situation
with wrong scheduling until next periodic update_available_resource is
done. It has good description in bug [2].

** What changes in patch were done?

New live_migration_claim was added to the ResourceTracker similarly to
resize and rebuild claim.

It was decided to initiate live_migration_claim within
check_can_live_migrate_destination on destination node. To make that
done migration (was created in conductor) and resource limits for
destination node (got from scheduler) must be passed to
check_can_live_migrate_destination, so that's why conductor call and
compute RPC API were changed.

Overall intention of this patch is taking info account amount of
resources on destination node that can be a basement for future LM
improvement related to numa, sr-iov, huge pages.

[1] https://review.openstack.org/#/c/244489/
[2] https://bugs.launchpad.net/nova/+bug/1289064

-- 
Thanks,

Andrey Volkov,
Software Engineer, Mirantis, Inc.



More information about the OpenStack-dev mailing list