[nova][ops] Trying to get per-instance live migration timeout action spec unstuck

Dan Smith dms at danplanet.com
Thu Jan 3 21:57:39 UTC 2019

> 1. This can already be done using existing APIs (as noted) client-side
> if monitoring the live migration and it times out for whatever you
> consider a reasonable timeout at the time.

There's another thing to point out here, which is that this is also
already doable by adjusting (rightly libvirt-specific) config tunables
on a compute node that is being evacuated. Those could be
hot-reloadable, meaning they could be changed without restarting the
compute service when the evac process begins. It doesn't let you control
it per-instance, granted, but there *is* a server-side solution to this
based on existing stuff.

> 2. The libvirt driver is the only one that currently supports abort
> and force-complete.
> For #1, while valid as a workaround, is less than ideal since it would
> mean having to orchestrate that into any tooling that needs that kind
> of workaround, be that OSC, openstacksdk, python-novaclient,
> gophercloud, etc. I think it would be relatively simple to pass those
> parameters through with the live migration request down to
> nova-compute and have the parameters override the config options and
> then it's natively supported in the API.
> For #2, while also true, I think is not a great reason *not* to
> support per-instance timeouts/actions in the API when we already have
> existing APIs that do the same thing and have the same backend compute
> driver limitations. To ease this, I think we can sort out two things:
> a) Can other virt drivers that support live migration (xenapi, hyperv,
> vmware in tree, and powervm out of tree) also support abort and
> force-complete actions? John Garbutt at least thought it should be
> possible for xenapi at the Stein PTG. I don't know about the others - 
> driver maintainers please speak up here. The next challenge would be
> getting driver maintainers to actually add that feature parity, but
> that need not be a priority for Stein as long as we know it's possible
> to add the support eventually.

I think that we asked Eric and he said that powervm would/could not
support such a thing because they hand the process off to the hypevisor
and don't pay attention to what happens after that (and/or can't cancel
it). I know John said he thought it would be doable for xenapi, but even
if it is, I'm not expecting it will happen.

I'd definitely like to hear from the others.

> b) There are pre-live migration checks that happen on the source
> compute before we initiate the actual guest transfer. If a user
> (admin) specified these new parameters and the driver does not support
> them, we could fail the live migration early. This wouldn't change the
> instance status but the migration would fail and an instance action
> event would be recorded to explain why it didn't work, and then the
> admin can retry without those parameters. This would shield us from
> exposing something in the API that could give a false sense of
> functionality when the backend doesn't support it.

This is better than nothing, granted. What I'm concerned about is not
that $driver never supports these, but rather that $driver shows up
later and wants *different* parameters. Or even that libvirt/kvm
migration changes in such a way that these no longer make sense even for
it. We already have an example this in-tree today, where the
recently-added libvirt post-copy mode makes the 'abort' option invalid.

> Given all of this, are these reasonable compromises to continue trying
> to drive this feature forward, and more importantly, are other
> operators looking to see this functionality added to nova? Huawei
> public cloud operators want it because they routinely are doing live
> migrations as part of maintenance activities and want to be able to
> control these values per-instance. I assume there are other
> deployments that would like the same.

I don't need to hold this up if everyone else is on board, but I don't
really want to +2 it. I'll commit to not -1ing it if it specifically
confirms support before starting a migration that won't honor the
requested limits.


More information about the openstack-discuss mailing list