On 1/3/2019 4:37 PM, Dan Smith wrote:
Because in nova[0] we currently only switch to post-copy after we decide we're not making progress right?
If you're referring to the "live_migration_progress_timeout" option that has been deprecated and was replaced in Stein with the live_migration_timeout_action option, which was a pre-requisite for the per-instance timeout + action spec. In Stein, we only switch to post-copy if we hit live_migration_completion_timeout and live_migration_timeout_action=force_complete and live_migration_permit_post_copy=True (and libvirt/qemu are new enough for post-copy), otherwise we pause the guest. So I don't think the stalled progress stuff has applied for awhile (OSIC found problems with it in Ocata and disabled/deprecated it).
If we later allow a configuration where post-copy is the default from the start (as I believe is the actual current recommendation from the virt people[1]), and someone triggers a migration with a short timeout and abort action, we'll not be able to actually do the abort.
Sorry but I don't understand this, how does "post-copy from the start" apply? If I specify a short timeout and abort action in the API, and the timeout is reached before the migration is complete, it should abort, just like if I abort it via the API. As noted above, post-copy should only be triggered once we reach the timeout, and if you overwrite that action to abort (per instance, in the API), it should abort rather than switch to post-copy. -- Thanks, Matt