On 6/12/2019 5:50 PM, Thomas Goirand wrote:
3. Other things I'm not thinking of? Should we add a force parameter to the API to allow the operator to forcefully delete (#2 above) if #1 fails? Force parameters are hacky and usually seem to cause more problems than they solve, but it does put the control in the operators hands. Let's say the --force is just doing the resize --confirm for the operator, or do an evacuate, then that's fine (and in fact, a good idea, automations are great...). If it's going to create a mess in the DB, then it's IMO a terrible idea.
I really don't think we're going to change the delete compute service API into an orchestrator that auto-confirms/evacuates the node(s) for you. This is something an external agent / script / service could determine, perform whatever actions, and retry, based on existing APIs (like the migrations API). The one catch is the evacuated instance allocations - there is not much you can do about those from the compute API, you would have to cleanup the allocations for those via the placement API directly.
However, I see a case that may happen: image a compute node is completely broken (think: broken motherboard...), then probably we do want to remove everything that's in there, and want to handle the case where nova-compute doesn't even respond. This very much is a real life scenario. If your --force is to address this case, then why not! Though again and of course, we don't want a mess in the db... :P
Well, that's where a force parameter would be available to the admin to decide what they want to happen depending on the situation rather than just have nova guess and hope it's what you wanted. We could check if the service is "up" using the service group API and make some determinations that way, i.e. if there are still allocations on the thing and it's down, assume you're deleting it because it's dead and you want it gone so we just cleanup the allocations for you. -- Thanks, Matt