On Wed, 8 May 2019, Matt Riedemann wrote:
Yup I agree with everything said from a nova perspective. Our public cloud operators were just asking about leaked allocations and if there was tooling to report and clean that kind of stuff up. I explained we have the heal_allocations CLI but that's only going to create allocations for *instances* and only if those instances aren't deleted, but we don't have anything in nova that deals with detection and cleanup of leaked allocations, sort of like what this tooling does [1] but I think is different.
I continue to wish that we had (or could chose to make) functionality on the compute node, perhaps in response to a signal of some kind that said: performed a reset of inventory and allocations. So that in case of doubt we can use reality as the authoritative source of truth, not either of the nova or placement dbs. I'm not sure if that's feasible at this stage. I agree that healing allocations for instances that are known to exist is easy, but cleaning up allocations that got left behind is harder. It's simplified somewhat (from nova's perspective) in that there should only ever be one group of allocations (that is, a thing identified by a consumer uuid) for an instance. Right now, you can generate a list of known consumers of compute nodes by doing what you describe: traversing the allocations of each compute node provider. If we ever move to a state where the compute node doesn't provide resources (and thus will have no allocations) we won't be able to do that, and that's one of the reasons why I get resistant when we talk about moving VCPU to NUMA nodes in all cases. Which supports your assertion that maybe some day it would be nice to list allocations by type. Some day. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent