On Wed, 12 Jun 2019, Matt Riedemann wrote:
2. Change delete_resource_provider cascade=True logic to remove all allocations for the provider before deleting it, i.e. for not-yet-complete migrations and evacuated instances. For the evacuated instance allocations this is likely OK since restarting the source compute service is going to do that cleanup anyway. Also, if you delete the source compute service during a migration, confirming or reverting the resize later will likely fail since we'd be casting to something that is gone (and we'd orphan those allocations). Maybe we need a functional recreate test for the unconfirmed migration scenario before deciding on this?
I think this is likely the right choice. If the service is being deleted (not disabled) it shouldn't have a resource provider and to not have a resource provider it needs to not have allocations, and of those left over allocations that it does have are either bogus now, or will be soon enough, may as well get them gone in a consistent and predictable way. That said, we shouldn't make a habit of a removing allocations just so we can remove a resource provider whenever we want, only in special cases like this. If/when we're modelling shared disk as a shared resource provider does this get any more complicated? Does the part of an allocation that is DISK_GB need special handling.
3. Other things I'm not thinking of? Should we add a force parameter to the API to allow the operator to forcefully delete (#2 above) if #1 fails? Force parameters are hacky and usually seem to cause more problems than they solve, but it does put the control in the operators hands.
I'm sort of maybe on this. A #1, with an option to inspect and then #2 seems friendly and potentially useful but how often is someone going to want to inspect versus just "whatevs, #2"? I don't know. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent