[nova][ops] What should the compute service delete behavior be wrt resource providers with allocations?
mriedemos at gmail.com
Thu Jun 13 17:44:52 UTC 2019
On 6/12/2019 5:50 PM, Thomas Goirand wrote:
>> 3. Other things I'm not thinking of? Should we add a force parameter to
>> the API to allow the operator to forcefully delete (#2 above) if #1
>> fails? Force parameters are hacky and usually seem to cause more
>> problems than they solve, but it does put the control in the operators
> Let's say the --force is just doing the resize --confirm for the
> operator, or do an evacuate, then that's fine (and in fact, a good idea,
> automations are great...). If it's going to create a mess in the DB,
> then it's IMO a terrible idea.
I really don't think we're going to change the delete compute service
API into an orchestrator that auto-confirms/evacuates the node(s) for
you. This is something an external agent / script / service could
determine, perform whatever actions, and retry, based on existing APIs
(like the migrations API). The one catch is the evacuated instance
allocations - there is not much you can do about those from the compute
API, you would have to cleanup the allocations for those via the
placement API directly.
> However, I see a case that may happen: image a compute node is
> completely broken (think: broken motherboard...), then probably we do
> want to remove everything that's in there, and want to handle the case
> where nova-compute doesn't even respond. This very much is a real life
> scenario. If your --force is to address this case, then why not! Though
> again and of course, we don't want a mess in the db... :P
Well, that's where a force parameter would be available to the admin to
decide what they want to happen depending on the situation rather than
just have nova guess and hope it's what you wanted.
We could check if the service is "up" using the service group API and
make some determinations that way, i.e. if there are still allocations
on the thing and it's down, assume you're deleting it because it's dead
and you want it gone so we just cleanup the allocations for you.
More information about the openstack-discuss