Matt Riedemann <mriedemos@gmail.com> writes:
On 1/3/2019 4:37 PM, Dan Smith wrote:
Because in nova[0] we currently only switch to post-copy after we decide we're not making progress right?
If you're referring to the "live_migration_progress_timeout" option that has been deprecated and was replaced in Stein with the live_migration_timeout_action option, which was a pre-requisite for the per-instance timeout + action spec.
In Stein, we only switch to post-copy if we hit live_migration_completion_timeout and live_migration_timeout_action=force_complete and live_migration_permit_post_copy=True (and libvirt/qemu are new enough for post-copy), otherwise we pause the guest.
So I don't think the stalled progress stuff has applied for awhile (OSIC found problems with it in Ocata and disabled/deprecated it).
Yeah, I'm trying to point out something _other_ than what is currently nova behavior.
If we later allow a configuration where post-copy is the default from the start (as I believe is the actual current recommendation from the virt people[1]), and someone triggers a migration with a short timeout and abort action, we'll not be able to actually do the abort.
Sorry but I don't understand this, how does "post-copy from the start" apply? If I specify a short timeout and abort action in the API, and the timeout is reached before the migration is complete, it should abort, just like if I abort it via the API. As noted above, post-copy should only be triggered once we reach the timeout, and if you overwrite that action to abort (per instance, in the API), it should abort rather than switch to post-copy.
You can't abort a post-copy migration once it has started. If we were to add an "always do post-copy" mode to Nova, per the recommendation from the post I linked, then we would start a migration in post-copy mode, which would make it un-cancel-able. That means not only could you not cancel it, but we would have to refuse to start the migration if the user requested an abort action via this new proposed API with any timeout value. Anyway, my point here is just that libvirt already (but not nova/libvirt yet) has a live migration mode where we would not be able to honor a request of "abort after N seconds". If config specified that, we could warn or fail on startup, but via the API all we'd be able to do is refuse to start the migration. I'm just trying to highlight that baking "force/abort after N seconds" into our API is not only just libvirt-specific at the moment, but even libvirt-pre-copy specific. --Dan