[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Matt Riedemann mriedemos at gmail.com
Mon Nov 5 19:17:13 UTC 2018


On 11/5/2018 12:28 PM, Mohammed Naser wrote:
>> Have you dug into any of the operations around these instances to
>> determine what might have gone wrong? For example, was a live migration
>> performed recently on these instances and if so, did it fail? How about
>> evacuations (rebuild from a down host).
> To be honest, I have not, however, I suspect a lot of those happen from the
> fact that it is possible that the service which makes the claim is not the
> same one that deletes it
> 
> I'm not sure if this is something that's possible but say the compute2 makes
> a claim for migrating to compute1 but something fails there, the revert happens
> in compute1 but compute1 is already borked so it doesn't work
> 
> This isn't necessarily the exact case that's happening but it's a summary
> of what I believe happens.
> 

The computes don't create the resource allocations in placement though, 
the scheduler does, unless this deployment still has at least one 
compute that is <Pike. You should probably check that to make sure.

The compute service should only be removing allocations for things like 
server delete, failed move operation (cleanup the allocations created by 
the scheduler), or a successful move operation (cleanup the allocations 
for the source node held by the migration record).

I wonder if you have migration records (from the cell DB migrations 
table) holding allocations in placement for some reason, even though the 
migration is complete. I know you have an audit script to look for 
allocations that are not held by instances, assuming those instances 
have been deleted and the allocations were leaked, but they could have 
also been held by the migration record and maybe leaked that way? 
Although if you delete the instance, the related migrations records are 
also removed (but maybe not their allocations?). I'm thinking of a case 
like, resize and instance but rather than confirm/revert it, the user 
deletes the instance. That would cleanup the allocations from the target 
node but potentially not from the source node.

-- 

Thanks,

Matt



More information about the OpenStack-dev mailing list