[nova][ptg] Resource provider delete at service delete

Balázs Gibizer balazs.gibizer at est.tech
Thu Nov 14 16:06:03 UTC 2019



On Thu, Nov 14, 2019 at 09:35, Matt Riedemann <mriedemos at gmail.com> 
wrote:
> On 11/10/2019 2:09 AM, Balázs Gibizer wrote:
>> * Check ongoing migration and reject the delete if migration with 
>> this
>>    compute having the source node exists. Let operator confirm the
>> migrations
> 
> To be clear, the suggestion here is call [1] from the API like around 
> [2]? That's a behavior change but so was blocking the delete when the 
> compute was hosting instances [3] and we added a release note for 
> that. Anyway, that's a pretty simple change and not really something 
> I thought about in earlier threads on this problem. Regarding 
> evacuate migration records that should also work since the final 
> states for an evacuate migration are done, failed or error for which 
> [1] accounts.

Yeah, [1] called at [2] sounds good to me. Regarding evacuation 
records. If the evacuation succeeded, i.e. the migration is in 'done' 
state then we are OK. But if it is finished with 'error' or 'failed' 
state then we still have an instance on the host so we should not allow 
deleting the compute service. As far as I see get_count_by_hosts will 
cover this case.

> 
>> * Cascade delete providers and allocations in placement.
>>    * in case of evacuated instances this is the right thing to do
> 
> OK this seems to confirm my TODO here [4].
> 
>>    * in any other dangling allocation case nova has the final thrut 
>> so
>> nova
>>      has the authority to delete them.
> 
> So this would build on the first idea above about blocking the 
> service delete if there are in-progress migrations involving the node 
> (either incoming or outgoing) right? So if we get to the point of 
> deleting the provider we know (1) there are no in-progress migrations 
> and (2) there are no instances on the host (outside of evacuated 
> instances which we can cleanup automatically per [4]). Given that, 
> I'm not sure there is really anything else to do here.

In theory cannot be any other allocation on the compute RP tree if 
there is no instance on the host, no ongoing migrations involving the 
host. But still I guess we need to cascade the delete to make sure that 
orphaned allocations (which is a bug itself but we no that it happens) 
are cleaned up when the service is deleted.

cheers,
gibi

> 
>> * Document possible ways to reconcile Placement with Nova using
>>    heal_allocations and eventually the audit command once it's 
>> merged.
> 
> Done (merged yesterday) [5].
> 
> [1] 
> https://github.com/openstack/nova/blob/20.0.0/nova/objects/migration.py#L240
> [2] 
> https://github.com/openstack/nova/blob/20.0.0/nova/api/openstack/compute/services.py#L254
> [3] https://review.opendev.org/#/c/560674/
> [4] 
> https://review.opendev.org/#/c/678100/2/nova/scheduler/client/report.py@2165
> [5] 
> https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html
> 
> --
> 
> Thanks,
> 
> Matt
> 





More information about the openstack-discuss mailing list