[nova][ops] Trying to get per-instance live migration timeout action spec unstuck

Chris Friesen chris.friesen at windriver.com
Wed Dec 19 17:34:33 UTC 2018


On 12/18/2018 8:04 PM, Matt Riedemann wrote:

> There are two main sticking points against this in the review:
> 
> 1. This can already be done using existing APIs (as noted) client-side 
> if monitoring the live migration and it times out for whatever you 
> consider a reasonable timeout at the time.
> 
> 2. The libvirt driver is the only one that currently supports abort and 
> force-complete.
> 
> For #1, while valid as a workaround, is less than ideal since it would 
> mean having to orchestrate that into any tooling that needs that kind of 
> workaround, be that OSC, openstacksdk, python-novaclient, gophercloud, 
> etc. I think it would be relatively simple to pass those parameters 
> through with the live migration request down to nova-compute and have 
> the parameters override the config options and then it's natively 
> supported in the API.

I agree that it would be cleaner to support it in one place rather than 
needing to add timeout handling to all the various clients.

> For #2, while also true, I think is not a great reason *not* to support 
> per-instance timeouts/actions in the API when we already have existing 
> APIs that do the same thing and have the same backend compute driver 
> limitations. To ease this, I think we can sort out two things:

<snip>

> b) There are pre-live migration checks that happen on the source compute 
> before we initiate the actual guest transfer. If a user (admin) 
> specified these new parameters and the driver does not support them, we 
> could fail the live migration early. This wouldn't change the instance 
> status but the migration would fail and an instance action event would 
> be recorded to explain why it didn't work, and then the admin can retry 
> without those parameters. This would shield us from exposing something 
> in the API that could give a false sense of functionality when the 
> backend doesn't support it.

I think this would be a reasonable way to handle it.

> Given all of this, are these reasonable compromises to continue trying 
> to drive this feature forward, and more importantly, are other operators 
> looking to see this functionality added to nova? Huawei public cloud 
> operators want it because they routinely are doing live migrations as 
> part of maintenance activities and want to be able to control these 
> values per-instance. I assume there are other deployments that would 
> like the same.

We added nova extensions to the existing Wind River Titanium Cloud 
product to allow more control over the handling of live migrations 
because they're frequently used by our operators and have caused issues 
in the past.  The new StarlingX project is more aligned with upstream, 
so it'd be good to have some sort of per-migration options available.

Chris




More information about the openstack-discuss mailing list