[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker
Matt Riedemann
mriedemos at gmail.com
Mon Nov 5 19:17:13 UTC 2018
On 11/5/2018 12:28 PM, Mohammed Naser wrote:
>> Have you dug into any of the operations around these instances to
>> determine what might have gone wrong? For example, was a live migration
>> performed recently on these instances and if so, did it fail? How about
>> evacuations (rebuild from a down host).
> To be honest, I have not, however, I suspect a lot of those happen from the
> fact that it is possible that the service which makes the claim is not the
> same one that deletes it
>
> I'm not sure if this is something that's possible but say the compute2 makes
> a claim for migrating to compute1 but something fails there, the revert happens
> in compute1 but compute1 is already borked so it doesn't work
>
> This isn't necessarily the exact case that's happening but it's a summary
> of what I believe happens.
>
The computes don't create the resource allocations in placement though,
the scheduler does, unless this deployment still has at least one
compute that is <Pike. You should probably check that to make sure.
The compute service should only be removing allocations for things like
server delete, failed move operation (cleanup the allocations created by
the scheduler), or a successful move operation (cleanup the allocations
for the source node held by the migration record).
I wonder if you have migration records (from the cell DB migrations
table) holding allocations in placement for some reason, even though the
migration is complete. I know you have an audit script to look for
allocations that are not held by instances, assuming those instances
have been deleted and the allocations were leaked, but they could have
also been held by the migration record and maybe leaked that way?
Although if you delete the instance, the related migrations records are
also removed (but maybe not their allocations?). I'm thinking of a case
like, resize and instance but rather than confirm/revert it, the user
deletes the instance. That would cleanup the allocations from the target
node but potentially not from the source node.
--
Thanks,
Matt
More information about the OpenStack-dev
mailing list