[openstack-dev] [nova] Should we add the 'force' option to the cold migrate API too?

Matt Riedemann mriedemos at gmail.com
Wed Aug 30 15:09:21 UTC 2017

Given the recent bugs [1][2] due to the force flag in the live migrate 
and evacuate APIs related to Placement, and some other long standing 
bugs about bypassing the scheduler [3], I don't think we should add the 
force option to the cold migrate API, as (re-)proposed in Takashi's cold 
migrate spec here [4].

I'm fine with being able to specify a host during cold migrate/resize, 
but I think the specified host should be validated by the scheduler (and 
placement) so that the instance can actually move to that specified 
destination host.

Since we've built more logic into the scheduler in Pike for integration 
with Placement, bypassing that gets us into maintenance issues with 
having to duplicate code throughout conductor and just in general, seems 
like a bad idea to force a host and bypass the scheduler and potentially 
break the instance. Not to mention the complicated logic of passing the 
host through from the API to conductor to the scheduler is it's own 
maintenance problem [5].

Looking back at when the force flag was added to the APIs, it was from 
this spec [6]. Reading that, before that microversion if a host was 
specified we'd bypass the scheduler, so the force flag was really just 
there for backward compatibility I guess in case you wanted the option 
to break the instance or your deployment. :) Otherwise after that 
microversion if you specify a host but not the force flag, then we 
validate the specified host via the scheduler first. Given this, and the 
fact we don't have any backward compatibility to maintain with 
specifying a host for cold migrate, I don't think we need to add a force 
flag for it, unless people really love that option on the live migrate 
and evacuate APIs, but it just seems overly dangerous to me.

Finally, if one is going to make the argument, "but this would be 
consistent with the live migrate and evacuate APIs", I can also point 
out that we don't allow you to specify a host (forced or not) during 
unshelve of a shelved offloaded instance - which is basically a move 
(new build on a new host chosen by the scheduler). I'm not advocating 
that we make unshelve more complicated though, because that's already 
broken in several known ways [7][8][9].

[1] https://bugs.launchpad.net/nova/+bug/1712008
[2] https://bugs.launchpad.net/nova/+bug/1713786
[3] https://bugs.launchpad.net/nova/+bug/1427772
[4] https://review.openstack.org/#/c/489031/
[7] https://bugs.launchpad.net/nova/+bug/1675791
[8] https://bugs.launchpad.net/nova/+bug/1627694
[9] https://bugs.launchpad.net/nova/+bug/1547142




